Bureaucrats, cc_docs_admin, cc_staff
2,306
edits
Line 32: | Line 32: | ||
If the program can run in parallel, the next question is how to specify the number of CPU cores that the program should use. The right syntax to use will depend on the particular program: it might be an option to be added as a command line argument like <tt>--nthreads=4</tt>, an environment variable you need to set before calling the program (e.g. <tt>export OMP_NUM_THREADS=4</tt>) or perhaps a line you should add to the program's parameter file. Once you know how to specify the number of CPU cores that the program should use, the next logical question is what number of cores to use? It may be tempting to simply reply, "as many as possible" but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program. To choose the optimal number of CPU cores, you need to study the software's [[Scalability | scalability]]. | If the program can run in parallel, the next question is how to specify the number of CPU cores that the program should use. The right syntax to use will depend on the particular program: it might be an option to be added as a command line argument like <tt>--nthreads=4</tt>, an environment variable you need to set before calling the program (e.g. <tt>export OMP_NUM_THREADS=4</tt>) or perhaps a line you should add to the program's parameter file. Once you know how to specify the number of CPU cores that the program should use, the next logical question is what number of cores to use? It may be tempting to simply reply, "as many as possible" but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program. To choose the optimal number of CPU cores, you need to study the software's [[Scalability | scalability]]. | ||
A further complication with parallel execution concerns the use of multiple nodes - many of the programming techniques used to allow a program to run in parallel assume the existence of a shared memory environment, i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use. Trying to ask for more cores than this or using more than one node will fail because the software you are running does not support <i>distributed memory parallelism</i>. Most software able to run over more than one node uses the [[MPI]] standard, so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. | A further complication with parallel execution concerns the use of multiple nodes - many of the programming techniques used to allow a program to run in parallel assume the existence of a shared memory environment, i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use. Trying to ask for more cores than this or using more than one node will fail because the software you are running does not support <i>distributed memory parallelism</i>. Most software able to run over more than one node uses the [[MPI]] standard, so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. Programs that have been parallelized to run across multiple nodes should be started using <tt>srun</tt> rather than <tt>mpirun</tt>. | ||
Ultimately, the goal should be to ensure that the CPU efficiency of your jobs is very close to 100%, as measured by the field of this name in the output from the <tt>seff</tt> command; any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved. | Ultimately, the goal should be to ensure that the CPU efficiency of your jobs is very close to 100%, as measured by the field of this name in the output from the <tt>seff</tt> command; any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved. |