Using node-local storage: Difference between revisions
No edit summary |
(Multi-node jobs copy,extract to slurm_tmpdir) |
||
Line 58: | Line 58: | ||
It only operates on a single file, though. | It only operates on a single file, though. | ||
====Copy files==== | |||
Copy one or more files to the <tt>SLURM_TMPDIR</tt> directory on every node allocated : | |||
{{Command|pdcp -w $(slurm_hl2hl.py --format PDSH) file [files...] $SLURM_TMPDIR}} | |||
Or using GNU Parallel: | |||
{{Command|parallel -S $(slurm_hl2hl.py --format GNU-Parallel) --env SLURM_TMPDIR --workdir $PWD --onall cp file [files...] ::: $SLURM_TMPDIR}} | |||
====Compressed Archive==== | |||
=====ZIP===== | |||
Extract to the <tt>SLURM_TMPDIR</tt>: | |||
{{Command|pdsh -w $(slurm_hl2hl.py --format PDSH) unzip archive.zip -d $SLURM_TMPDIR}} | |||
=====Tarball===== | |||
Extract to the <tt>SLURM_TMPDIR</tt>: | |||
{{Command|pdsh -w $(slurm_hl2hl.py --format PDSH) tar -xvf archive.tar.gz -C $SLURM_TMPDIR}} | |||
=== Amount of space === | === Amount of space === |
Revision as of 18:48, 17 April 2020
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
When Slurm starts a job, it creates on each node assigned to the job a temporary directory.
It then sets the full path name of that directory in an environment variable called SLURM_TMPDIR
.
Because this directory resides on local disk, input and output (I/O) to it
is almost always faster than I/O to a network file system (/project, /scratch, or /home).
Specifically, local disk is better for frequent small I/O transactions than network storage.
Any job doing a lot of input and output (which is most jobs!) may expect
to run more quickly if it uses $SLURM_TMPDIR
instead of network disk.
The temporary character of $SLURM_TMPDIR
makes more trouble to use than
network storage.
Input must be copied from network storage to $SLURM_TMPDIR
before it can be read,
and output must be copied out of $SLURM_TMPDIR
back to network storage to preserve
it for later use.
Input
In order to read data from $SLURM_TMPDIR
, the data must first be copied there.
In the simplest case this can be done with cp
or rsync
:
cp /project/def-someone/you/input.files.* $SLURM_TMPDIR/
This may not work if the input is too large, or if it must be read by processes on different nodes. See "Amount of space" and "Multi-node jobs" below for more.
Executable files and libraries
A special case of input is the application code itself. In order to run the application, the shell started by Slurm must open at least an application file, which it typically reads from network storage. But few applications these days consist of exactly one file; most also need several other files (such as libraries) in order to work.
We particularly find that using an application in a Python virtual environment
generates a large number of small I/O transactions--- More than it takes
to create the virtual environment in the first place. This is why we recommend
creating virtual environments inside your jobs
using $SLURM_TMPDIR
.
Output
Output data must be copied from $SLURM_TMPDIR
back to some permanent storage before the
job ends. If a job times out, then the last few lines of the job script might not
be executed.
This should be addressed two ways:
- ...
- ...
Multi-node jobs
If a job spans multiple nodes and some data is needed on every node, then a simple cp
or tar -x
will not suffice.
The Slurm utility sbcast may be useful here.
It will distribute a file to every node assigned to a job.
It only operates on a single file, though.
Copy files
Copy one or more files to the SLURM_TMPDIR directory on every node allocated :
[name@server ~]$ pdcp -w $(slurm_hl2hl.py --format PDSH) file [files...] $SLURM_TMPDIR
Or using GNU Parallel:
[name@server ~]$ parallel -S $(slurm_hl2hl.py --format GNU-Parallel) --env SLURM_TMPDIR --workdir $PWD --onall cp file [files...] ::: $SLURM_TMPDIR
Compressed Archive
ZIP
Extract to the SLURM_TMPDIR:
[name@server ~]$ pdsh -w $(slurm_hl2hl.py --format PDSH) unzip archive.zip -d $SLURM_TMPDIR
Tarball
Extract to the SLURM_TMPDIR:
[name@server ~]$ pdsh -w $(slurm_hl2hl.py --format PDSH) tar -xvf archive.tar.gz -C $SLURM_TMPDIR
Amount of space
At Niagara $SLURM_TMPDIR is implemented as "RAMdisk", so the amount of space available is limited by the memory on the node, less the amount of RAM used by your application. See Data management at Niagara for more.
At the general-purpose clusters Béluga, Cedar, and Graham, the amount of space available depends on the cluster and the node to which your job is assigned.
cluster | space in $SLURM_TMPDIR | size of disk |
---|---|---|
Béluga | 370G | 480G |
Cedar | 840G | 960G |
Graham | 750G | 960G |
The table above gives the typical amount of free space in $SLURM_TMPDIR on the smallest node in each cluster. If your job reserves whole nodes then you can reasonably assume that this much space is available to you in $SLURM_TMPDIR on each node. However, if the job requests less than a whole node, then other jobs may also write to the same filesystem (but not the same directory!), reducing the space available to your job.
Some nodes at each site have more local disk than shown above. See "Node characteristics" at the appropriate page (Béluga, Cedar, Graham) for guidance.