cc_staff
782
edits
(Memory consumption with bullet points) |
(Parallelism with some bullet points) |
||
Line 58: | Line 58: | ||
==Parallelism== | ==Parallelism== | ||
By default your job will get one core on one node and this is the most sensible policy because most software is serial: it can only ever make use of a single core. Asking for more cores and/or nodes will not make the program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. How can you determine if the software you're using can run in parallel? The best approach is to look in the software's documentation for a section on parallel execution: if you can't find anything, this is usually a sign that this program is serial. You can also contact the development team to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version. | * By default your job will get one core on one node and this is the most sensible policy because '''most software is serial''': it can only ever make use of a single core. | ||
** Asking for more cores and/or nodes will not make the serial program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. | |||
* How can you '''determine if''' the software you're using '''can run in parallel'''? | |||
** The best approach is to '''look in the software's documentation''' for a section on parallel execution: if you can't find anything, this is usually a sign that this program is serial. | |||
** You can also '''contact the development team''' to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version. | |||
If the program can run in parallel, the next question is | * If the program can run in parallel, the next question is '''what number of cores to use'''? | ||
** Many of the programming techniques used to allow a program to run in parallel assume the existence of a ''shared memory environment'', i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use. | |||
** It may be tempting to simply request "as many cores as possible", but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program. | |||
** To choose the optimal number of CPU cores, you need to '''study the [[Scalability|software's scalability]]'''. | |||
A further complication with parallel execution concerns the use of multiple nodes - | * A further complication with parallel execution concerns '''the use of multiple nodes''' - the software you are running must support ''distributed memory parallelism''. | ||
** Most software able to run over more than one node uses '''the [[MPI]] standard''', so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. | |||
** Programs that have been parallelized to run across multiple nodes '''should be started using''' <tt>srun</tt> rather than <tt>mpirun</tt>. | |||
* A goal should also be to '''avoid scattering your parallel processes across more nodes than is necessary''': a more compact distribution will usually help your job's performance. | |||
** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | |||
** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240 etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes. | |||
<source> | <source> | ||
#SBATCH --nodes=3 | #SBATCH --nodes=3 | ||
#SBATCH --ntasks-per-node=40 | #SBATCH --ntasks-per-node=40 | ||
</source> | </source> | ||
Ultimately, the goal should be to ensure that the CPU efficiency of your jobs is very close to 100%, as measured by the field | * Ultimately, the goal should be to '''ensure that the CPU efficiency of your jobs is very close to 100%''', as measured by the field <tt>CPU Efficiency</tt> in the output from the <tt>seff</tt> command. | ||
** Any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved. | |||
==Using GPUs== | ==Using GPUs== |