Using node-local storage

Revision as of 21:11, 3 April 2020 by Rdickson (talk | contribs) (Created page with "{{Draft}} When Slurm starts a job, it creates a temporary directory on each node assigned to the job. It also sets the full path name of the directory in an environment varia...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




When Slurm starts a job, it creates a temporary directory on each node assigned to the job. It also sets the full path name of the directory in an environment variable called SLURM_TMPDIR.

Because this directory resides on local disk, input and output (I/O) to it is almost always faster than I/O to one a network file system (/project, /scratch, or /home). Any job doing substantial input and output (which is most jobs!) may expect to run more quickly if it uses $SLURM_TMPDIR instead of network disk.

The temporary character of $SLURM_TMPDIR however makes it not suitable for every use.

Input

In order to *read* data from $SLURM_TMPDIR, the data must first be copied there. MORE TO COME...

Output

Output data must be copied from $SLURM_TMPDIR back to some permanent storage before the job ends. If a job times out, then the last few lines of the job script might not be executed. MORE TO COME ...

Multi-node jobs

If a job spans multiple nodes and some data is needed on every node, then a simple 'cp' or 'tar -x' will not suffice. The Slurm utility 'sbcast' can be useful here. MORE TO COME ...