Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
m (Rdickson moved page Running MPI jobs with SLURM to Advanced MPI scheduling) |
|||
Line 25: | Line 25: | ||
=== Why srun instead of mpiexec or mpirun? === | === Why srun instead of mpiexec or mpirun? === | ||
<code>mpirun</code> is a wrapper that enables communication between processes running on different machines. Modern schedulers already provide many things that <code>mpirun</code> needs. With Torque/Moab, for example, there is no need to pass to <code>mpirun</code> the list of nodes on which to run, or the number of processes to launch; this is done automatically by the scheduler. With Slurm, the task affinity is also resolved by the scheduler, so there is no need to specify things like | |||
mpirun --map-by node:pe=4 -n 16 application.exe | |||
As implied in the examples above, <code>srun application.exe</code> will automatically distribute the processes to precisely the resources allocated to the job. | |||
In programming terminology, <code>srun</code> is higher level of abstraction than <code>mpirun</code>. Anything that can be done with <code>mpirun</code> can be done with <code>srun</code>, and more. It is the tool in Slurm to distribute any kind of computations. It replaces Torque’s <code>pbsdsh</code>, for example, and much more. Think of <code>srun</code> as the SLURM "all-around parallel-tasks distributor"; once a particular set of resources is allocated, the nature of your application doesn't matter (MPI, OpenMP, hybrid, serial farming, pipelining, multi-program, etc.), you just have to <code>srun</code> it | |||
Also, as should be expected, <code>srun</code> is fully coupled to Slurm. When you <code>srun</code> an application, a "job step" is started, the environment variables <code>SLURM_STEP_ID</code> and <code>SLURM_PROCID</code> are initialized correctly, and correct accounting information is recorded. | |||
=== External links === | === External links === |