Running jobs: Difference between revisions

Jump to navigation Jump to search
Marked this version for translation
(Marked this version for translation)
Line 47: Line 47:
Please be cautious if you use a script to submit multiple Slurm jobs in a short time. Submitting thousands of jobs at a time can cause Slurm to become [[Frequently_Asked_Questions#.22sbatch:_error:_Batch_job_submission_failed:_Socket_timed_out_on_send.2Frecv_operation.22|unresponsive]] to other users. Consider using an [[Running jobs#Array job|array job]] instead, or use <code>sleep</code> to space out calls to <code>sbatch</code> by one second or more.
Please be cautious if you use a script to submit multiple Slurm jobs in a short time. Submitting thousands of jobs at a time can cause Slurm to become [[Frequently_Asked_Questions#.22sbatch:_error:_Batch_job_submission_failed:_Socket_timed_out_on_send.2Frecv_operation.22|unresponsive]] to other users. Consider using an [[Running jobs#Array job|array job]] instead, or use <code>sleep</code> to space out calls to <code>sbatch</code> by one second or more.


=== Memory ===
=== Memory === <!--T:161-->


<!--T:106-->
<!--T:106-->
Memory may be requested with <code>--mem-per-cpu</code> (memory per core) or <code>--mem</code> (memory per node).  On general-purpose (GP) clusters a default memory amount of 256 MB per core will be allocated unless you make some other request.  At [[Niagara]] only whole nodes are allocated along with all available memory, so a memory specification is not required there.
Memory may be requested with <code>--mem-per-cpu</code> (memory per core) or <code>--mem</code> (memory per node).  On general-purpose (GP) clusters a default memory amount of 256 MB per core will be allocated unless you make some other request.  At [[Niagara]] only whole nodes are allocated along with all available memory, so a memory specification is not required there.


<!--T:162-->
A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc).  The effect of this is that each node-type has a maximum amount available to jobs - for instance, nominally "128G" nodes are typically configured to permit 125G of memory to user jobs.  If you request more memory than a node-type provides, your job will be constrained to run on higher-memory nodes, which may be fewer in number.
A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc).  The effect of this is that each node-type has a maximum amount available to jobs - for instance, nominally "128G" nodes are typically configured to permit 125G of memory to user jobs.  If you request more memory than a node-type provides, your job will be constrained to run on higher-memory nodes, which may be fewer in number.


<!--T:163-->
Adding to this confusion, Slurm interprets K, M, G, etc., as [https://en.wikipedia.org/wiki/Binary_prefix binary prefixes], so <code>--mem=125G</code> is equivalent to <code>--mem=128000M</code>.  See the "available memory" column in the "Node types and characteristics" table for each GP cluster for the Slurm specification of the maximum memory you can request on each node: [[Béluga/en#Node_types_and_characteristics|Béluga]], [[Cedar#Node_types_and_characteristics|Cedar]], [[Graham#Node_types_and_characteristics|Graham]].
Adding to this confusion, Slurm interprets K, M, G, etc., as [https://en.wikipedia.org/wiki/Binary_prefix binary prefixes], so <code>--mem=125G</code> is equivalent to <code>--mem=128000M</code>.  See the "available memory" column in the "Node types and characteristics" table for each GP cluster for the Slurm specification of the maximum memory you can request on each node: [[Béluga/en#Node_types_and_characteristics|Béluga]], [[Cedar#Node_types_and_characteristics|Cedar]], [[Graham#Node_types_and_characteristics|Graham]].


Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu