rsnt_translations
56,420
edits
No edit summary |
No edit summary |
||
Line 51: | Line 51: | ||
<!--T:106--> | <!--T:106--> | ||
Memory may be requested with <code>--mem-per-cpu</code> (memory per core) or <code>--mem</code> (memory per node). On general-purpose (GP) clusters a default memory amount of 256 MB per core will be allocated unless you make some other request. | Memory may be requested with <code>--mem-per-cpu</code> (memory per core) or <code>--mem</code> (memory per node). On general-purpose (GP) clusters a default memory amount of 256 MB per core will be allocated unless you make some other request. On [[Niagara]], only whole nodes are allocated along with all available memory, so a memory specification is not required there. | ||
<!--T:162--> | <!--T:162--> | ||
A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc.). The effect of this is that each node | A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc.). The effect of this is that each node type has a maximum amount available to jobs; for instance, nominally "128G" nodes are typically configured to permit 125G of memory to user jobs. If you request more memory than a node-type provides, your job will be constrained to run on higher-memory nodes, which may be fewer in number. | ||
<!--T:163--> | <!--T:163--> | ||
Adding to this confusion, Slurm interprets K, M, G, etc., as [https://en.wikipedia.org/wiki/Binary_prefix binary prefixes], so <code>--mem=125G</code> is equivalent to <code>--mem=128000M</code>. See the | Adding to this confusion, Slurm interprets K, M, G, etc., as [https://en.wikipedia.org/wiki/Binary_prefix binary prefixes], so <code>--mem=125G</code> is equivalent to <code>--mem=128000M</code>. See the <i>Available memory</i> column in the "Node characteristics" table for each GP cluster for the Slurm specification of the maximum memory you can request on each node: [[Béluga/en#Node_characteristics|Béluga]], [[Cedar#Node_characteristics|Cedar]], [[Graham#Node_characteristics|Graham]], [[Narval/en#Node_characteristics|Narval]]. | ||
==Use <code>squeue</code> or <code>sq</code> to list jobs== <!--T:60--> | ==Use <code>squeue</code> or <code>sq</code> to list jobs== <!--T:60--> | ||
<!--T:61--> | <!--T:61--> | ||
The general command for checking the status of Slurm jobs is <code>squeue</code>, but by default it supplies information about | The general command for checking the status of Slurm jobs is <code>squeue</code>, but by default it supplies information about <b>all</b> jobs in the system, not just your own. You can use the shorter <code>sq</code> to list only your own jobs: | ||
<!--T:62--> | <!--T:62--> | ||
Line 79: | Line 79: | ||
<!--T:115--> | <!--T:115--> | ||
<b>Do not</b> run <code>sq</code> or <code>squeue</code> from a script or program at high frequency (e.g. every few seconds). Responding to <code>squeue</code> adds load to Slurm, and may interfere with its performance or correct operation. See [[#Email_notification|Email notification]] below for a much better way to learn when your job starts or ends. | |||
==Where does the output go?== <!--T:63--> | ==Where does the output go?== <!--T:63--> | ||
<!--T:64--> | <!--T:64--> | ||
By default the output is placed in a file named "slurm-", suffixed with the job ID number and ".out" | By default the output is placed in a file named "slurm-", suffixed with the job ID number and ".out" (e.g. <code>slurm-123456.out</code>), in the directory from which the job was submitted. | ||
Having the job ID as part of the file name is convenient for troubleshooting. | Having the job ID as part of the file name is convenient for troubleshooting. | ||
Line 98: | Line 98: | ||
<!--T:67--> | <!--T:67--> | ||
Every job must have an associated | Every job must have an associated account name corresponding to a [[Frequently_Asked_Questions_about_the_CCDB#What_is_a_RAP.3F|Resource Allocation Project]] (RAP). If you are a member of only one account, the scheduler will automatically associate your jobs with that account. | ||
<!--T:107--> | <!--T:107--> | ||
Line 122: | Line 122: | ||
and click on "My Account -> Account Details". You will see a list of all the projects | and click on "My Account -> Account Details". You will see a list of all the projects | ||
you are a member of. The string you should use with the <code>--account</code> for | you are a member of. The string you should use with the <code>--account</code> for | ||
a given project is under the column | a given project is under the column <i>Group Name</i>. Note that a Resource | ||
Allocation Project may only apply to a specific cluster (or set of clusters) and therefore | Allocation Project may only apply to a specific cluster (or set of clusters) and therefore | ||
may not be transferable from one cluster to another. | may not be transferable from one cluster to another. | ||
Line 138: | Line 138: | ||
export SBATCH_ACCOUNT=$SLURM_ACCOUNT | export SBATCH_ACCOUNT=$SLURM_ACCOUNT | ||
export SALLOC_ACCOUNT=$SLURM_ACCOUNT | export SALLOC_ACCOUNT=$SLURM_ACCOUNT | ||
Slurm will use the value of <code>SBATCH_ACCOUNT</code> in place of the <code>--account</code> directive in the job script. Note that even if you supply an account name inside the job script, | Slurm will use the value of <code>SBATCH_ACCOUNT</code> in place of the <code>--account</code> directive in the job script. Note that even if you supply an account name inside the job script, <i>the environment variable takes priority.</i> In order to override the environment variable, you must supply an account name as a command-line argument to <code>sbatch</code>. | ||
<!--T:72--> | <!--T:72--> |