Using node-local storage: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
{{Draft}}
{{Draft}}


When Slurm starts a job, it creates a temporary directory on each node assigned to the job.
When Slurm starts a job, it creates on each node assigned to the job a temporary directory.
It also sets the full path name of the directory in an environment variable called SLURM_TMPDIR.
It then sets the full path name of the directory in an environment variable called SLURM_TMPDIR.


Because this directory resides on local disk, input and output (I/O) to it
Because this directory resides on local disk, input and output (I/O) to it
Line 9: Line 9:
to run more quickly if it uses $SLURM_TMPDIR instead of network disk.
to run more quickly if it uses $SLURM_TMPDIR instead of network disk.


The temporary character of $SLURM_TMPDIR however makes it not suitable for every use.
However, the temporary character of $SLURM_TMPDIR makes it not suitable for every use.
 
Input must be copied from network storage in to $SLURM_TMPDIR before it can be read,
and output must be copied out of $SLURM_TMPDIR back to network storage to preserve
it for later use. 
 
 


=== Input ===
=== Input ===
Line 15: Line 21:
In order to ''read'' data from $SLURM_TMPDIR, the data must first be copied there.   
In order to ''read'' data from $SLURM_TMPDIR, the data must first be copied there.   
MORE TO COME...
MORE TO COME...
==== Executable files and libraries ====
A special case of input is reading the application code itself.
Few applications these days consist of exactly one binary file:  Most applications
also need to read certain other files (such as libraries) in order to work.
Specifically we find that using an application in a [[Python]] virtual environment
generates a large number of small I/O transactions--- Usually, more than it takes
to create the virtual environment in the first place.  This is why we recommend 
[[Python#Creating virtual environments inside of your jobs|creating virtual environments inside of your jobs]].


=== Output ===
=== Output ===

Revision as of 17:19, 17 April 2020


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




When Slurm starts a job, it creates on each node assigned to the job a temporary directory. It then sets the full path name of the directory in an environment variable called SLURM_TMPDIR.

Because this directory resides on local disk, input and output (I/O) to it is almost always faster than I/O to a network file system (/project, /scratch, or /home). Any job doing substantial input and output (which is most jobs!) may expect to run more quickly if it uses $SLURM_TMPDIR instead of network disk.

However, the temporary character of $SLURM_TMPDIR makes it not suitable for every use.

Input must be copied from network storage in to $SLURM_TMPDIR before it can be read, and output must be copied out of $SLURM_TMPDIR back to network storage to preserve it for later use.


Input

In order to read data from $SLURM_TMPDIR, the data must first be copied there. MORE TO COME...

Executable files and libraries

A special case of input is reading the application code itself. Few applications these days consist of exactly one binary file: Most applications also need to read certain other files (such as libraries) in order to work.

Specifically we find that using an application in a Python virtual environment generates a large number of small I/O transactions--- Usually, more than it takes to create the virtual environment in the first place. This is why we recommend creating virtual environments inside of your jobs.

Output

Output data must be copied from $SLURM_TMPDIR back to some permanent storage before the job ends. If a job times out, then the last few lines of the job script might not be executed. MORE TO COME ...

Multi-node jobs

If a job spans multiple nodes and some data is needed on every node, then a simple cp or tar -x will not suffice. The Slurm utility sbcast may be useful here. It will distribute a file to every node assigned to a job. It only operates on a single file, though.

MORE TO COME ...

Amount of space

At Niagara $SLURM_TMPDIR is implemented as "RAMdisk", so the amount of space available is limited by the memory on the node, less the amount of RAM used by your application. See Data management at Niagara for more.

At the general-purpose clusters Béluga, Cedar, and Graham, the amount of space available depends on the cluster and the node to which your job is assigned.

cluster space in $SLURM_TMPDIR size of disk
Béluga 370G 480G
Cedar 840G 960G
Graham 750G 960G

The table above gives the typical amount of free space in $SLURM_TMPDIR on the smallest node in each cluster. If your job reserves whole nodes then you can reasonably assume that this much space is available to you in $SLURM_TMPDIR on each node. However, if the job requests less than a whole node, then other jobs may also write to the same filesystem (but not the same directory!), reducing the space available to your job.

Some nodes at each site have more local disk than shown above. See "Node characteristics" at the appropriate page (Béluga, Cedar, Graham) for guidance.