LAMMPS: Difference between revisions

883 bytes added ,  6 years ago
Line 177: Line 177:
LAMMPS uses the domain decomposition to split the work among the available processors by assigning a small subset of simulation box to each available processor. During the computation of the interactions between particles, a communication between the processors is required. For a given number of particles, more processors used, more subsets of the simulation box are used. Therefore, the communication time will increase leading to a low CPU efficiency.  
LAMMPS uses the domain decomposition to split the work among the available processors by assigning a small subset of simulation box to each available processor. During the computation of the interactions between particles, a communication between the processors is required. For a given number of particles, more processors used, more subsets of the simulation box are used. Therefore, the communication time will increase leading to a low CPU efficiency.  


Before running extensive simulations for a given problem size or a size of the simulation box, it is recommended to run some tests to see how the program scales with increasing the number of cores. The idea is to run short tests using different number of cores in order to determine the suitable number of core that will maximize the efficiency of the simulation.
Before running extensive simulations for a given problem size or a size of the simulation box, it is recommended to run some tests to see how the program scales with increasing the number of cores. The idea is to run short tests using different number of cores in order to determine the suitable number of cores that will maximize the efficiency of the simulation. Most of the CPU time for Molecular Dynamics simulations is spent in computing the pair interactions between particles. In order to get a better performance from a simulation, one has to reduce the communication time between the processors. 


The following example shows the MPI task timing breakdown from a simulation of a system of 4000 particles using 12 MPI tasks.  
The following example shows the MPI task timing breakdown from a simulation of a system of 4000 particles using 12 MPI tasks. This is an example of a very low efficiency: by using 12 cores, the system of 4000 atoms was divided to 12 small boxes. The time spent '''46.45 %''' of the time for computing pair interactions and '''44.5 %''' in communications between the processors. The large number of small boxes for a such small system leads to the increase of the communication time. For an efficient MD simulations, the communication time should be minimized in order to use the rest in computing the pair interactions.
 
was '''46.45'''. Therefore, many communications were required. This explain why the program spend '''44.5 %''' of the time in communications. 


{| class="wikitable" style="text-align: center; border-width: 2px;width: 100%;"
{| class="wikitable" style="text-align: center; border-width: 2px;width: 100%;"
cc_staff
415

edits