Running jobs: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 54: Line 54:


<!--T:162-->
<!--T:162-->
A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc).  The effect of this is that each node-type has a maximum amount available to jobs - for instance, nominally "128G" nodes are typically configured to permit 125G of memory to user jobs.  If you request more memory than a node-type provides, your job will be constrained to run on higher-memory nodes, which may be fewer in number.
A common source of confusion comes from the fact that some memory on a node is not available to the job (reserved for the OS, etc.).  The effect of this is that each node-type has a maximum amount available to jobs - for instance, nominally "128G" nodes are typically configured to permit 125G of memory to user jobs.  If you request more memory than a node-type provides, your job will be constrained to run on higher-memory nodes, which may be fewer in number.


<!--T:163-->
<!--T:163-->
Line 138: Line 138:
  export SBATCH_ACCOUNT=$SLURM_ACCOUNT
  export SBATCH_ACCOUNT=$SLURM_ACCOUNT
  export SALLOC_ACCOUNT=$SLURM_ACCOUNT
  export SALLOC_ACCOUNT=$SLURM_ACCOUNT
Slurm will use the value of <code>SBATCH_ACCOUNT</code> in place of the <code>--account</code> directive in the job script. Note that even if you supply an account name inside the job script, ''the environment variable takes priority.'' In order to override the environment variable you must supply an account name as a command-line argument to <code>sbatch</code>.
Slurm will use the value of <code>SBATCH_ACCOUNT</code> in place of the <code>--account</code> directive in the job script. Note that even if you supply an account name inside the job script, ''the environment variable takes priority.'' In order to override the environment variable, you must supply an account name as a command-line argument to <code>sbatch</code>.


<!--T:72-->
<!--T:72-->
Line 270: Line 270:


<!--T:151-->
<!--T:151-->
Get a short summary of the CPU- and memory-efficiency of a job with <code>seff</code>:
Get a short summary of the CPU and memory efficiency of a job with <code>seff</code>:
  $ seff 12345678
  $ seff 12345678
  Job ID: 12345678
  Job ID: 12345678
Line 481: Line 481:


<!--T:155-->
<!--T:155-->
There are certain differences in the job scheduling policies from one our clusters to another and these are summarized by tab in the following section:
There are certain differences in the job scheduling policies from one of our clusters to another and these are summarized by tab in the following section:


<!--T:156-->
<!--T:156-->
Line 492: Line 492:
</tab>
</tab>
<tab name="Cedar">
<tab name="Cedar">
Jobs may not be submitted from directories on the /home filesystem on Cedar. This is to reduce the load on that filesystem and improve the responsiveness for interactive work. If the command <tt>readlink -f $(pwd) | cut -d/ -f2</tt> returns <tt>home</tt>, you are not permitted to submit jobs from that directory. Transfer the files from that directory either to a /project or /scratch directory and submit the job from there.
Jobs may not be submitted from directories on the /home filesystem on Cedar. This is to reduce the load on that filesystem and improve the responsiveness for interactive work. If the command <code>readlink -f $(pwd) | cut -d/ -f2</code> returns <code>home</code>, you are not permitted to submit jobs from that directory. Transfer the files from that directory either to a /project or /scratch directory and submit the job from there.


<!--T:164-->
<!--T:164-->
Line 510: Line 510:


<!--T:158-->
<!--T:158-->
Keep in mind that a job which would have obtained an entire node for itself by specifying for example <tt>#SBATCH --cpus-per-task=32</tt> will now share the remaining 16 CPU cores with another job if it happens to use a Skylake node; if you wish to reserve the entire node you will need to request all 48 cores or add the <tt>#SBATCH --constraint=broadwell</tt> option to your job script.  
Keep in mind that a job which would have obtained an entire node for itself by specifying for example <code>#SBATCH --cpus-per-task=32</code> will now share the remaining 16 CPU cores with another job if it happens to use a Skylake node; if you wish to reserve the entire node you will need to request all 48 cores or add the <code>#SBATCH --constraint=broadwell</code> option to your job script.  


<!--T:159-->
<!--T:159-->
rsnt_translations
56,420

edits