Best practices for job submission: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 82: Line 82:
<!--T:15-->
<!--T:15-->
* If the program can run in parallel, the next question is <b>what number of cores to use</b>?
* If the program can run in parallel, the next question is <b>what number of cores to use</b>?
** Many of the programming techniques used to allow a program to run in parallel assume the existence of a ''shared memory environment'', i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use.
** Many of the programming techniques used to allow a program to run in parallel assume the existence of a <i>shared memory environment</i>, i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use.
** It may be tempting to simply request "as many cores as possible", but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program.
** It may be tempting to simply request "as many cores as possible", but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program.
** To choose the optimal number of CPU cores, you need to '''study the [[Scalability|software's scalability]]'''.
** To choose the optimal number of CPU cores, you need to <b>study the [[Scalability|software's scalability]]</b>.


<!--T:16-->
<!--T:16-->
* A further complication with parallel execution concerns <b>the use of multiple nodes</b> - the software you are running must support ''distributed memory parallelism''.
* A further complication with parallel execution concerns <b>the use of multiple nodes</b> - the software you are running must support <i>distributed memory parallelism</i>.
** Most software able to run over more than one node uses <b>the [[MPI]] standard</b>, so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node.
** Most software able to run over more than one node uses <b>the [[MPI]] standard</b>, so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node.
** Programs that have been parallelized to run across multiple nodes <b>should be started using</b> <code>srun</code> rather than <code>mpirun</code>.  
** Programs that have been parallelized to run across multiple nodes <b>should be started using</b> <code>srun</code> rather than <code>mpirun</code>.  
rsnt_translations
56,430

edits