Advanced MPI scheduling
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Most users should submit MPI or distributed memory parallel jobs as illustrated
at Running jobs: MPI job. Simply request a number of
processes with --ntasks
or -n
and trust the scheduler
to allocate those processes in a way that balances the efficiency of your job
with the overall efficiency of the cluster.
If you need more detailed control over how your job is allocated, then read on
to learn about SLURM's sbatch
command and how its numerous options constrain the placement of processes.
Controlling the distribution of processes[edit]
The basic MPI job shown at Running jobs is suitable for the majority of users, those who simply want their calculations to start running at the earliest opportunity.
However, you may wish to investigate how the performance of your MPI application is affected when the processes are distributed in different ways, or you may know from such an investigation that it performs best under certain constraints. For especially demanding applications this can extend to the level of not just nodes, but sockets, cores, and threads. See SchedMD's page on multicore support for detailed information about process placement with SLURM.
Hybrid jobs: MPI and OpenMP, or MPI and threads[edit]
To come
MPI and GPUs[edit]
To come
Troubleshooting and performance monitoring[edit]
To come
Why srun instead of mpiexec or mpirun?[edit]
To come