rsnt_translations
57,772
edits
No edit summary |
No edit summary |
||
Line 87: | Line 87: | ||
<!--T:16--> | <!--T:16--> | ||
* A further complication with parallel execution concerns | * A further complication with parallel execution concerns <b>the use of multiple nodes</b> - the software you are running must support ''distributed memory parallelism''. | ||
** Most software able to run over more than one node uses | ** Most software able to run over more than one node uses <b>the [[MPI]] standard</b>, so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. | ||
** Programs that have been parallelized to run across multiple nodes | ** Programs that have been parallelized to run across multiple nodes <b>should be started using</b> <code>srun</code> rather than <code>mpirun</code>. | ||
<!--T:17--> | <!--T:17--> | ||
* A goal should also be to | * A goal should also be to <b>avoid scattering your parallel processes across more nodes than is necessary</b>: a more compact distribution will usually help your job's performance. | ||
** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ||
** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240, etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes. | ** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240, etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes. |