LAMMPS: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
Line 177: Line 177:
LAMMPS uses the domain decomposition to split the work among the available processors by assigning a small subset of simulation box to each available processor. During the computation of the interactions between particles, a communication between the processors is required. For a given number of particles, more processors used, more subsets of the simulation box are used. Therefore, the communication time will increase leading to a low CPU efficiency.  
LAMMPS uses the domain decomposition to split the work among the available processors by assigning a small subset of simulation box to each available processor. During the computation of the interactions between particles, a communication between the processors is required. For a given number of particles, more processors used, more subsets of the simulation box are used. Therefore, the communication time will increase leading to a low CPU efficiency.  


Before running extensive simulations for a given problem size or a size of the simulation box, it is recommended to run some tests to see how the program scales with increasing the number of cores. The idea is to run short tests using different number of cores in order to determine the suitable number of core that will maximize the efficiency of the simulation.
Before running extensive simulations for a given problem size or a size of the simulation box, it is recommended to run some tests to see how the program scales with increasing the number of cores. The idea is to run short tests using different number of cores in order to determine the suitable number of cores that will maximize the efficiency of the simulation. Most of the CPU time for Molecular Dynamics simulations is spent in computing the pair interactions between particles. In order to get a better performance from a simulation, one has to reduce the communication time between the processors. 


The following example shows the MPI task timing breakdown from a simulation of a system of 4000 particles using 12 MPI tasks.  
The following example shows the MPI task timing breakdown from a simulation of a system of 4000 particles using 12 MPI tasks. This is an example of a very low efficiency: by using 12 cores, the system of 4000 atoms was divided to 12 small boxes. The time spent '''46.45 %''' of the time for computing pair interactions and '''44.5 %''' in communications between the processors. The large number of small boxes for a such small system leads to the increase of the communication time. For an efficient MD simulations, the communication time should be minimized in order to use the rest in computing the pair interactions.
 
was '''46.45'''. Therefore, many communications were required. This explain why the program spend '''44.5 %''' of the time in communications. 


{| class="wikitable" style="text-align: center; border-width: 2px;width: 100%;"
{| class="wikitable" style="text-align: center; border-width: 2px;width: 100%;"

Revision as of 20:22, 29 July 2018


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




General[edit]

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic / Molecular Massively Parallel Simulator. LAMMPS is distributed by Sandia National Laboratories, a US Department of Energy laboratory. The main authors of LAMMPS are listed on this page along with contact information and other contributors. Funding for LAMMPS development has come primarily from DOE (OASCR, OBER, ASCI, LDRD, Genomes-to-Life) and is acknowledged here.

Code Layout[edit]

  • C++ and Object-Oriented approach.
  • Parallelization via MPI and OpenMP; runs on GPU.
  • Is invoked by commands through input scripts.
  • Possibility to customize the output.
  • Could be interfaced with other codes.

Force fields and examples[edit]

Force fields classified by Material[edit]

  • Biomolecules: CHARMM, AMBER, OPLS, COMPASS (class 2), long-range Coulombics via PPPM, point dipoles, ...
  • Polymers: all-atom, united-atom, coarse-grain (bead-spring FENE), bond-breaking, …
  • Materials: EAM and MEAM for metals, Buckingham, Morse, Yukawa, Stillinger-Weber, Tersoff, EDIP, COMB, SNAP, ...
  • Chemistry: AI-REBO, REBO, ReaxFF, eFF
  • Mesoscale: granular, DPD, Gay-Berne, colloidal, peri-dynamics, DSMC...
  • Hybrid: can use combinations of potentials for hybrid systems: water on metal, polymers/semiconductor interface, colloids in solution, …

Potential classified by Functional Form[edit]

  • Pairwise potentials: Lennard-Jones, Buckingham, ...
  • Charged Pairwise Potentials: Coulombic, point-dipole
  • Manybody Potentials: EAM, Finnis/Sinclair, modified EAM (MEAM), embedded ion (EIM), Stillinger-Weber, Tersoff, AI-REBO, ReaxFF, COMB
  • Coarse-Grained Potentials: DPD, GayBerne, ...
  • Mesoscopic Potentials: granular, peri-dynamics
  • Long-Range Electrostatics: Ewald, PPPM, MSM
  • Implicit Solvent Potentials: hydrodynamic lubrication, Debye
  • Force-Field Compatibility with common: CHARMM, AMBER, OPLS, GROMACS options

Modules[edit]

Several versions of LAMMPS were installed on cvmfs and accessible on Compute Canada systems through modules. To find the modules, use: module spider lammps or module -r spider '.*lammps.*'

The version of each module gives the date of the release of each version in the format: YYYYMMDD. The name of the module contains an attribute depending on the accelerators included in the module.

For each release installed, one or more modules are are available. For example, the release of 31 March 2017 has 3 modules:

  • Version built with MPI: lammps/20170331
  • Version built with USER-OMP support: lmmps-omp/20170331
  • Version built with USER-INTEL support: lammps-user-intel/20170331

These versions are also available with GPU support. In order to load the GPU enabled version of LAMMPS, the cuda module needs to be loaded first before loading the LAMMPS module:

$ module load cuda
$ module load lmmps-omp/20170331

The name of the executable may differ from one version to another. To figure out what is the name of the executable that correspond to a given module, do the following (example for lammps-omp/20170331):

$ module load  lmmps-omp/20170331
$ ls ${EBROOTLAMMPS}/bin/
lmp lmp_icc_openmpi

From this output, the executable is: lmp_icc_openmpi. Note that lmp is a symbolic link to the executable. For all versions installed on cvmfs, a symbolic link was added to each LAMMPS executable and it is called lmp. It means that no matter which module you pick, lmp will work as the executable for that module.

The reason behind different versions for the same release is the difference in the packages included. The recent versions of LAMMPS contain about 60 different packages that can be enabled when compiling the program. All the packages are documented on the official web page of LAMMPS .

For each module installed on cvmfs, a file list-packages.txt is provided and gives a list of supported and non-supported packages for that particular module. The different versions for one release mentioned above come from the fact one can not put all available packages in one binary. If for some reason, your simulation does not work with one module, it is more likely related to the fact that the corresponding package was not included.

To see or know more about the supported packages on a given module , do the following:

  • First load a particular module of LAMMPS (use module -r spider '.*lammps.*' to see how to load a particular module).
  • Then, execute the command: cat ${EBROOTLAMMPS}/list-packages.txt

For more information on Environment Modules, please refer to the Using modules page.

Scripts for running LAMMPS[edit]

File : lammps.in

# 3d Lennard-Jones melt

units           lj
atom_style      atomic

lattice         fcc 0.8442
region          box block 0 15 0 15 0 15
create_box      1 box
create_atoms    1 box
mass            1 1.0

velocity        all create 1.44 87287 loop geom

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5
neighbor        0.3 bin
neigh_modify    delay 5 every 1

fix             1 all nve
thermo          5
run             10000
write_data     config.end_sim

# End of the Input file.


File : run_lmp_serial.sh

#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=2500M      # memory; default unit is megabytes.
#SBATCH --time=0-00:30           # time (DD-HH:MM).

# Load the module:

module load nixpkgs/16.09  intel/2016.4  openmpi/2.1.1 lammps-omp/20170811

echo "Starting run at: `date`"

lmp_exec=lmp_icc_openmpi
lmp_input="lammps.in"
lmp_output="lammps_lj_output.txt"

${lmp_exec} < ${lmp_input} > ${lmp_output}

echo "Program finished with exit code $? at: `date`"


File : run_lmp_mpi.sh

#!/bin/bash

#SBATCH --ntasks=4               # number of MPI processes.
#SBATCH --mem-per-cpu=2500M      # memory; default unit is megabytes.
#SBATCH --time=0-00:30           # time (DD-HH:MM).

# Load the module:

module load nixpkgs/16.09  intel/2016.4  openmpi/2.1.1 lammps-omp/20170811

echo "Starting run at: `date`"

lmp_exec=lmp_icc_openmpi
lmp_input="lammps.in"
lmp_output="lammps_lj_output.txt"

srun ${lmp_exec} < ${lmp_input} > ${lmp_output}

echo "Program finished with exit code $? at: `date`"


Benchmarks[edit]

CPU efficiency[edit]

LAMMPS uses the domain decomposition to split the work among the available processors by assigning a small subset of simulation box to each available processor. During the computation of the interactions between particles, a communication between the processors is required. For a given number of particles, more processors used, more subsets of the simulation box are used. Therefore, the communication time will increase leading to a low CPU efficiency.

Before running extensive simulations for a given problem size or a size of the simulation box, it is recommended to run some tests to see how the program scales with increasing the number of cores. The idea is to run short tests using different number of cores in order to determine the suitable number of cores that will maximize the efficiency of the simulation. Most of the CPU time for Molecular Dynamics simulations is spent in computing the pair interactions between particles. In order to get a better performance from a simulation, one has to reduce the communication time between the processors.

The following example shows the MPI task timing breakdown from a simulation of a system of 4000 particles using 12 MPI tasks. This is an example of a very low efficiency: by using 12 cores, the system of 4000 atoms was divided to 12 small boxes. The time spent 46.45 % of the time for computing pair interactions and 44.5 % in communications between the processors. The large number of small boxes for a such small system leads to the increase of the communication time. For an efficient MD simulations, the communication time should be minimized in order to use the rest in computing the pair interactions.

was 46.45. Therefore, many communications were required. This explain why the program spend 44.5 % of the time in communications.

Loop time of 15.4965 on 12 procs for 25000 steps with 4000 atoms.

Performance: 696931.853 tau/day, 1613.268 timesteps/s.
90.2% CPU use with 12 MPI tasks x 1 OpenMP threads.

Section min time avg time max time %varavg %total
Pair 6.6964 7.1974 7.9599 14.8 46.45
Neigh 0.94857 1.0047 1.0788 4.3 6.48
Comm 6.0595 6.8957 7.4611 17.1 44.50
Output 0.01517 0.01589 0.019863 1.0 0.10
Modify 0.14023 0.14968 0.16127 1.7 0.97
Other -- 0.2332 -- -- 1.50

Related Software[edit]

  • DL_POLY:
  • CPMD:
  • GULP:
  • NAMD:
  • CHARMM:
  • AMBER:
  • GROMACS:
  • NWCHEM:
  • HOOMD:
  • Tinker:

Useful Links[edit]