CUBEP3M is a high performance cosmological N-body code which has many utilities and extensions, including a runtime halo finder, a non-Gaussian initial conditions generator, a tuneable accuracy, and a system of unique particle identification. CUBEP3M is fast, has a memory imprint up to three times lower than other widely used N-body codes, and has been run on up to 20,000 cores, achieving close to ideal weak scaling even at this problem size. It is well suited and has already been used for a broad number of science applications that require either large samples of non-linear realizations or very large dark matter N-body simulations, including cosmological reionization, baryonic acoustic oscillations, weak lensing or non-Gaussian statistics.