Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
(Marked this version for translation) |
(remove ~ from prompts, following Cedar no-jobs-from-home policy) |
||
Line 19: | Line 19: | ||
The command to submit a job is [https://slurm.schedmd.com/sbatch.html <code>sbatch</code>]: | The command to submit a job is [https://slurm.schedmd.com/sbatch.html <code>sbatch</code>]: | ||
<source lang="bash"> | <source lang="bash"> | ||
$ sbatch simple_job.sh | |||
Submitted batch job 123456 | Submitted batch job 123456 | ||
</source> | </source> | ||
Line 41: | Line 41: | ||
<!--T:59--> | <!--T:59--> | ||
You can also specify directives as command-line arguments to <code>sbatch</code>. So for example, | You can also specify directives as command-line arguments to <code>sbatch</code>. So for example, | ||
$ sbatch --time=00:30:00 simple_job.sh | |||
will submit the above job script with a time limit of 30 minutes. The acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". | will submit the above job script with a time limit of 30 minutes. The acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". | ||
Line 65: | Line 65: | ||
<!--T:62--> | <!--T:62--> | ||
<source lang="bash"> | <source lang="bash"> | ||
$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
123456 cpubase_b simple_j someuser R 0:03 1 cdr234 | 123456 cpubase_b simple_j someuser R 0:03 1 cdr234 | ||
Line 228: | Line 228: | ||
<!--T:29--> | <!--T:29--> | ||
You can start an interactive session on a compute node with [https://slurm.schedmd.com/salloc.html salloc]. In the following example we request two tasks, which corresponds to two CPU cores, for an hour: | You can start an interactive session on a compute node with [https://slurm.schedmd.com/salloc.html salloc]. In the following example we request two tasks, which corresponds to two CPU cores, for an hour: | ||
$ salloc --time=1:0:0 --ntasks=2 --account=def-someuser | |||
salloc: Granted job allocation 1234567 | salloc: Granted job allocation 1234567 | ||
$ ... # do some work | |||
$ exit # terminate the allocation | |||
salloc: Relinquishing job allocation 1234567 | salloc: Relinquishing job allocation 1234567 | ||
Line 243: | Line 243: | ||
<!--T:32--> | <!--T:32--> | ||
By default [https://slurm.schedmd.com/squeue.html squeue] will show all the jobs the scheduler is managing at the moment. It may run much faster if you ask only about your own jobs with | By default [https://slurm.schedmd.com/squeue.html squeue] will show all the jobs the scheduler is managing at the moment. It may run much faster if you ask only about your own jobs with | ||
squeue -u $USER | $ squeue -u $USER | ||
<!--T:33--> | <!--T:33--> | ||
You can show only running jobs, or only pending jobs: | You can show only running jobs, or only pending jobs: | ||
squeue -u <username> -t RUNNING | $ squeue -u <username> -t RUNNING | ||
squeue -u <username> -t PENDING | $ squeue -u <username> -t PENDING | ||
<!--T:34--> | <!--T:34--> | ||
Line 287: | Line 287: | ||
<!--T:35--> | <!--T:35--> | ||
Find more detailed information about a completed job with [https://slurm.schedmd.com/sacct.html sacct], and optionally, control what it prints using <code>--format</code>: | Find more detailed information about a completed job with [https://slurm.schedmd.com/sacct.html sacct], and optionally, control what it prints using <code>--format</code>: | ||
sacct -j <jobid> | $ sacct -j <jobid> | ||
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed | $ sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed | ||
<!--T:153--> | <!--T:153--> | ||
Line 312: | Line 312: | ||
<!--T:132--> | <!--T:132--> | ||
$ srun --jobid 123456 --pty watch -n 30 nvidia-smi | |||
<!--T:133--> | <!--T:133--> | ||
Line 319: | Line 318: | ||
<!--T:134--> | <!--T:134--> | ||
$ srun --jobid 123456 --pty tmux new-session -d 'htop -u $USER' \; split-window -h 'watch nvidia-smi' \; attach | |||
<!--T:135--> | <!--T:135--> | ||
Line 333: | Line 331: | ||
Use [https://slurm.schedmd.com/scancel.html scancel] with the job ID to cancel a job: | Use [https://slurm.schedmd.com/scancel.html scancel] with the job ID to cancel a job: | ||
<!--T:39--> | |||
scancel <jobid> | $ scancel <jobid> | ||
<!--T:40--> | <!--T:40--> | ||
You can also use it to cancel all your jobs, or all your pending jobs: | You can also use it to cancel all your jobs, or all your pending jobs: | ||
<!--T:41--> | |||
scancel -u $USER | $ scancel -u $USER | ||
scancel -t PENDING -u $USER | $ scancel -t PENDING -u $USER | ||
== Resubmitting jobs for long running computations == <!--T:74--> | == Resubmitting jobs for long running computations == <!--T:74--> | ||
Line 529: | Line 527: | ||
<!--T:119--> | <!--T:119--> | ||
<source lang="console"> | <source lang="console"> | ||
$ module load gcc | |||
$ module load quantumespresso/6.1 | |||
Lmod has detected the following error: These module(s) exist but cannot be loaded as requested: "quantumespresso/6.1" | Lmod has detected the following error: These module(s) exist but cannot be loaded as requested: "quantumespresso/6.1" | ||
Try: "module spider quantumespresso/6.1" to see how to load the module(s). | Try: "module spider quantumespresso/6.1" to see how to load the module(s). | ||
$ module spider quantumespresso/6.1 | |||
<!--T:120--> | <!--T:120--> |