Using node-local storage: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 62: Line 62:


<!--T:28-->
<!--T:28-->
You can ask Slurm to send your script a signal a short time before the run-time expires with
You can use [https://slurm.schedmd.com/sbatch.html <code>--signal</code>] to get Slurm to send your script a signal shortly before the run-time expires. 
[https://slurm.schedmd.com/sbatch.html <code>--signal</code>], and write a shell function which
To take advantage of this, write a shell function which copies your output from <code>$SLURM_TMPDIR</code> back to network storage,
copies your output from <code>$SLURM_TMPDIR</code> back to network storage when that happens.
and use the <code>trap</code> shell command to associate the function with the signal.
This will not address unexpected node failures, but may be useful if your run-time estimate is uncertain,
This may be useful if your run-time estimate is uncertain,
or if you are chaining together several Slurm jobs to complete a long calculation.
or if you are chaining together several Slurm jobs to complete a long calculation.
However, it will not preserve the contents of <code>$SLURM_TMPDIR</code> in the case of a node failure.
See [https://services.criann.fr/en/services/hpc/cluster-myria/guide/signals-sent-by-slurm/ this page]
See [https://services.criann.fr/en/services/hpc/cluster-myria/guide/signals-sent-by-slurm/ this page]
from le Centre Régional Informatique et d'Applications Numériques de Normandie (CRIANN)
from le Centre Régional Informatique et d'Applications Numériques de Normandie (CRIANN)
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits