rsnt_translations
56,420
edits
No edit summary |
No edit summary |
||
Line 249: | Line 249: | ||
<!--T:160--> | <!--T:160--> | ||
<b>Do not</b> run <code>squeue</code> from a script or program at high frequency (e.g., every few seconds). Responding to <code>squeue</code> adds load to Slurm and may interfere with its performance or correct operation. | |||
==== Email notification ==== <!--T:149--> | ==== Email notification ==== <!--T:149--> | ||
Line 322: | Line 322: | ||
<!--T:136--> | <!--T:136--> | ||
<b>Noteː</b> The <code>srun</code> commands shown above work only to monitor a job submitted with <code>sbatch</code>. To monitor an interactive job, create multiple panes with <code>tmux</code> and start each process in its own pane. | |||
==Cancelling jobs== <!--T:37--> | ==Cancelling jobs== <!--T:37--> | ||
Line 343: | Line 343: | ||
<!--T:75--> | <!--T:75--> | ||
When a computation is going to require a long time to complete, so long that it cannot be done within the time limits on the system, | When a computation is going to require a long time to complete, so long that it cannot be done within the time limits on the system, | ||
the application you are running must support [[Points de contrôle/en|checkpointing]]. The application should be able to save its state to a file, called a | the application you are running must support [[Points de contrôle/en|checkpointing]]. The application should be able to save its state to a file, called a >i>checkpoint file</i>, and | ||
then it should be able to restart and continue the computation from that saved state. | then it should be able to restart and continue the computation from that saved state. | ||
Line 353: | Line 353: | ||
<!--T:77--> | <!--T:77--> | ||
Here are two recommended methods of automatic restarting: | Here are two recommended methods of automatic restarting: | ||
* Using SLURM | * Using SLURM <b>job arrays</b>. | ||
* Resubmitting from the end of the job script. | * Resubmitting from the end of the job script. | ||