rsnt_translations
56,430
edits
No edit summary |
No edit summary |
||
Line 32: | Line 32: | ||
<!--T:6--> | <!--T:6--> | ||
This may not work if the input is too large, or if it must be read by processes on different nodes. | This may not work if the input is too large, or if it must be read by processes on different nodes. | ||
See <i>Amount of space</i> and <i> | See <i>Amount of space</i> and <i>Multinode jobs</i> below for more. | ||
== Executable files and libraries == <!--T:7--> | == Executable files and libraries == <!--T:7--> | ||
Line 56: | Line 56: | ||
job ends. If a job times out, then the last few lines of the job script might not | job ends. If a job times out, then the last few lines of the job script might not | ||
be executed. This can be addressed two ways: | be executed. This can be addressed two ways: | ||
* First, obviously, request enough | * First, obviously, request enough runtime to let the application finish, although we understand that this isn't always possible. | ||
* Write [[Points_de_contrôle/en|checkpoints]] to network storage, not to <code>$SLURM_TMPDIR</code>. | * Write [[Points_de_contrôle/en|checkpoints]] to network storage, not to <code>$SLURM_TMPDIR</code>. | ||
Line 62: | Line 62: | ||
<!--T:28--> | <!--T:28--> | ||
You can use [https://slurm.schedmd.com/sbatch.html <code>--signal</code>] to get Slurm to send your script a signal shortly before the | You can use [https://slurm.schedmd.com/sbatch.html <code>--signal</code>] to get Slurm to send your script a signal shortly before the runtime expires. | ||
To take advantage of this, write a shell function which copies your output from <code>$SLURM_TMPDIR</code> back to network storage, | To take advantage of this, write a shell function which copies your output from <code>$SLURM_TMPDIR</code> back to network storage, | ||
and use the <code>trap</code> shell command to associate the function with the signal. | and use the <code>trap</code> shell command to associate the function with the signal. | ||
This may be useful if your | This may be useful if your runtime estimate is uncertain, | ||
or if you are chaining together several Slurm jobs to complete a long calculation. | or if you are chaining together several Slurm jobs to complete a long calculation. | ||
However, it will not preserve the contents of <code>$SLURM_TMPDIR</code> in the case of a node failure. | However, it will not preserve the contents of <code>$SLURM_TMPDIR</code> in the case of a node failure. | ||
Line 72: | Line 72: | ||
for an example script and detailed guidance. | for an example script and detailed guidance. | ||
= | = Multinode jobs = <!--T:12--> | ||
<!--T:13--> | <!--T:13--> |