Using node-local storage: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 32: Line 32:
<!--T:6-->
<!--T:6-->
This may not work if the input is too large, or if it must be read by processes on different nodes.
This may not work if the input is too large, or if it must be read by processes on different nodes.
See <i>Amount of space</i> and <i>Multi-node jobs</i> below for more.
See <i>Amount of space</i> and <i>Multinode jobs</i> below for more.


== Executable files and libraries == <!--T:7-->
== Executable files and libraries == <!--T:7-->
Line 56: Line 56:
job ends.  If a job times out, then the last few lines of the job script might not  
job ends.  If a job times out, then the last few lines of the job script might not  
be executed.  This can be addressed two ways:
be executed.  This can be addressed two ways:
* First, obviously, request enough run time to let the application finish.  We understand that this isn't always possible.
* First, obviously, request enough runtime to let the application finish, although we understand that this isn't always possible.
* Write [[Points_de_contrôle/en|checkpoints]] to network storage, not to <code>$SLURM_TMPDIR</code>.
* Write [[Points_de_contrôle/en|checkpoints]] to network storage, not to <code>$SLURM_TMPDIR</code>.


Line 62: Line 62:


<!--T:28-->
<!--T:28-->
You can use [https://slurm.schedmd.com/sbatch.html <code>--signal</code>] to get Slurm to send your script a signal shortly before the run-time expires.   
You can use [https://slurm.schedmd.com/sbatch.html <code>--signal</code>] to get Slurm to send your script a signal shortly before the runtime expires.   
To take advantage of this, write a shell function which copies your output from <code>$SLURM_TMPDIR</code> back to network storage,  
To take advantage of this, write a shell function which copies your output from <code>$SLURM_TMPDIR</code> back to network storage,  
and use the <code>trap</code> shell command to associate the function with the signal.
and use the <code>trap</code> shell command to associate the function with the signal.
This may be useful if your run-time estimate is uncertain,
This may be useful if your runtime estimate is uncertain,
or if you are chaining together several Slurm jobs to complete a long calculation.
or if you are chaining together several Slurm jobs to complete a long calculation.
However, it will not preserve the contents of <code>$SLURM_TMPDIR</code> in the case of a node failure.
However, it will not preserve the contents of <code>$SLURM_TMPDIR</code> in the case of a node failure.
Line 72: Line 72:
for an example script and detailed guidance.
for an example script and detailed guidance.


= Multi-node jobs = <!--T:12-->
= Multinode jobs = <!--T:12-->


<!--T:13-->
<!--T:13-->
rsnt_translations
56,430

edits