Advanced MPI scheduling: Difference between revisions

Jump to navigation Jump to search
Marked this version for translation
No edit summary
(Marked this version for translation)
Line 31: Line 31:
==== Whole nodes ==== <!--T:17-->
==== Whole nodes ==== <!--T:17-->


<!--T:25-->
If you have a large parallel job to run, that is, one that can efficiently use 32 cores or more, you should probably request whole nodes. To do so, it helps to know what node types are available at the cluster you are using.
If you have a large parallel job to run, that is, one that can efficiently use 32 cores or more, you should probably request whole nodes. To do so, it helps to know what node types are available at the cluster you are using.


Line 36: Line 37:
Typical nodes in [[Cedar]], [[Graham]] and [[Niagara]] have the following CPU and memory configuration:
Typical nodes in [[Cedar]], [[Graham]] and [[Niagara]] have the following CPU and memory configuration:


<!--T:26-->
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 49: Line 51:
|}
|}


<!--T:27-->
Whole-node jobs are allowed to run on any node. "Some are reserved for whole-node jobs" in the table above indicates that there are nodes on which by-core jobs are forbidden.
Whole-node jobs are allowed to run on any node. "Some are reserved for whole-node jobs" in the table above indicates that there are nodes on which by-core jobs are forbidden.


<!--T:28-->
A job script requesting whole nodes should look like this:
A job script requesting whole nodes should look like this:


<!--T:29-->
<tabs>
<tabs>
<tab name="Graham">
<tab name="Graham">
Line 92: Line 97:
Requesting <code>--mem=0</code> is interpreted by Slurm to mean "reserve all the available memory on each node assigned to the job."  
Requesting <code>--mem=0</code> is interpreted by Slurm to mean "reserve all the available memory on each node assigned to the job."  


<!--T:30-->
Note however that if you need more memory per node than the smallest node provides (e.g. more than 125 GiB at Graham) then you should not use <code>--mem=0</code>, but request the amount explicitly. Furthermore, some memory on each node is reserved for the operating system, so the largest amount your job can request and still qualify for each node type is as follows:
Note however that if you need more memory per node than the smallest node provides (e.g. more than 125 GiB at Graham) then you should not use <code>--mem=0</code>, but request the amount explicitly. Furthermore, some memory on each node is reserved for the operating system, so the largest amount your job can request and still qualify for each node type is as follows:


<!--T:31-->
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 111: Line 118:
|}
|}


<!--T:32-->
Requesting <code>--mem=256G</code> instead of <code>--mem=256000M</code> means the job will not schedule on a so-called 256GB node. The job will not be rejected by Slurm, it will just wait longer that it needs to for a rarer, larger node.
Requesting <code>--mem=256G</code> instead of <code>--mem=256000M</code> means the job will not schedule on a so-called 256GB node. The job will not be rejected by Slurm, it will just wait longer that it needs to for a rarer, larger node.


rsnt_translations
56,420

edits

Navigation menu