Storage and file management: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Added a general overview and some material on Unix permissions.)
No edit summary
Tag: Manual revert
 
(178 intermediate revisions by 22 users not shown)
Line 1: Line 1:
{{Draft}}
<languages />
==Overview==
<translate>
==Overview== <!--T:1-->


The filesystems on Compute Canada systems are with just a few exceptions a ''shared'' resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users.  
<!--T:2-->
These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like <tt>tar</tt> or <tt>zip</tt> to convert a directory containing many small files into a single very large archive file.  
We provide a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the [https://en.wikipedia.org/wiki/File_system filesystems] on our systems are a <i>shared</i> resource and for this reason should be used responsibly because unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, which are typically binary since very large (hundreds of MB or more) text files lose most of their interest in being readable by humans. You should therefore avoid storing tens of thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like [[Archiving and compressing files|<code>tar</code>]] or <code>zip</code> to convert a directory containing many small files into a single very large archive file.  


It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using [[Globus]].  
<!--T:3-->
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and one of our systems or between two of our systems should generally be done using [[Globus]].  


Note that Compute Canada storage systems are not for personal use and should only be used to store research data.  
<!--T:4-->
 
Note that our storage systems are not for personal use and should only be used to store research data.
==Filesystem Layout==


Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.  
<!--T:17-->
When your account is created on a cluster, your home directory will not be entirely empty. It will contain references to your scratch and [[Project layout|project]] spaces through the mechanism of a [https://en.wikipedia.org/wiki/Symbolic_link symbolic link], a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is shared by a research group. This group may consist of those individuals with an account sponsored by a particular faculty member or members of an [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition/ RAC allocation]. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder <code>projects</code> within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role, it is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.


When you login to a Compute Canada system you will normally begin in your home directory (<tt>$HOME</tt>) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code.  
<!--T:16-->
All users can check the available disk space and the current disk utilization for the <i>project</i>, <i>home</i> and <i>scratch</i> filesystems with the command line utility <b><i>diskusage_report</b></i>, available on our clusters. To use this utility, log into the cluster using SSH, at the command prompt type <i>diskusage_report</i>, and press the Enter key. Below is a typical output of this utility:
<pre>
# diskusage_report
                  Description                Space          # of files
                Home (username)        280 kB/47 GB              25/500k
              Scratch (username)        4096 B/18 TB              1/1000k
      Project (def-username-ab)      4096 B/9536 GB              2/500k
          Project (def-username)      4096 B/9536 GB              2/500k
</pre>
More detailed output is available using the [[Diskusage Explorer]] tool.


===Quotas===
== Storage types == <!--T:5-->
Unlike your personal computer, our systems will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most of our systems and the intended use of each one along with some of its characteristics.
* <b>HOME:</b> While your home directory may seem like the logical place to store all your files and do all your work, in general this isn't the case; your home normally has a relatively small quota and doesn't have especially good performance for writing and reading large amounts of data. The most logical use of your home directory is typically source code, small parameter files and job submission scripts.
* <b>PROJECT:</b> The project space has a significantly larger quota and is well adapted to [[Sharing data | sharing data]] among members of a research group since it, unlike the home or scratch, is linked to a professor's account rather than an individual user. The data stored in the project space should be fairly static, that is to say the data are not likely to be changed many times in a month. Otherwise, frequently changing data, including just moving and renaming directories, in project can become a heavy burden on the tape-based backup system.
* <b>SCRATCH</b>: For intensive read/write operations on large files (> 100 MB per file), scratch is the best choice. However, remember that important files must be copied off scratch since they are not backed up there, and older files are subject to [[Scratch purging policy|purging]]. The scratch storage should therefore be used for temporary files: checkpoint files, output from jobs and other data that can easily be recreated. <b>Do not regard SCRATCH as your normal storage!  It is for transient files that you can afford to lose.</b>
* <b>SLURM_TMPDIR</b>: While a job is running, the environment variable <code>$SLURM_TMPDIR</code> holds a unique path to a temporary folder on a fast, local filesystem on each compute node allocated to the job. When the job ends, the directory and its contents are deleted, so <code>$SLURM_TMPDIR</code> should be used for temporary files that are only needed for the duration of the job. Its advantage, compared to the other networked filesystem types above, is increased performance due to the filesystem being local to the compute node. It is especially well-suited for large collections of small files (for example, smaller than a few megabytes per file). Note that this filesystem is shared between all jobs running on the node, and that the available space depends on the compute node type. A more detailed discussion of using <code>$SLURM_TMPDIR</code> is available at [[Using_$SLURM_TMPDIR | this page]].


===Backup===
==Project space consumption per user== <!--T:23-->                                                           


===Permissions===
<!--T:24-->
While the command <b>diskusage_report</b> gives the space and file count usage per user on <i>home</i> and <i>scratch</i>, it shows the total quota of the group on project. It includes all the files from each member of the group. Since the files that belong to a user could however be anywhere in the project space, it is difficult to obtain correct figures per user and per given project in case a user has access to more than one project. However, users can obtain an estimate of their space and file count use on the entire project space by running the command


Like most modern filesystems, those used on the servers of Compute Canada include the idea of permissions to read, write and execute files and directories. When you attempt to read, modify or delete file or descend into a directory the Linux kernel first verifies that you have the right to do this and if not, you'll see the error message "Permission denied". For every given filesystem object, there are three classes of users: the object's owner (normally the user who created the file or directory), the owner's group and finally everyone else. Each of these different user classes can have the right to read, write or execute a filesystem object, so that all told there are nine permissions associated with each such object.
<!--T:26-->
<code>lfs quota -u $USER /project</code>


You can see what the current permissions are for a filesystem object with the command
<!--T:25-->
{{Command|ls -l name_of_object}}
In addition to that, users can obtain an estimate for the number of files in a given directory (and its subdirectories) using the command <code>lfs find</code>, e.g.
which will print out the permissions for the owner, the group and everyone else, for example <tt>-rw-r--r--</tt> for a file that the owner can read and write but not execute and for which everyone else can only read. You'll also see printed out the name of the object's owner and the group. It's common for people to use "octal notation" when referring to Unix filesystem permissions. In this case, we use three bits to represent the permissions for each category of user, with these three bits then interpreted as a number from 0 to 7 using the formula (read_bit)*2^2 + (write_bit)*2^1 + (execute_bit)*2^0. The file permissions given in the above example would have the octal representation 1*2^2 + 1*2^1 + 0*2^0 = 6 for the owner and 1*2^2 + 0*2^1 + 0*2^0 = 4 for the group and everyone else, so 644 overall. Note that to be able to exercise your rights on a file, you also need to be able to access the directory in which it resides, which means having both read and execute permission (or "5" in octal notation) on the directory in question.
<source lang="console">
lfs find <path to the directory> -type f | wc -l
</source>


You can alter these permissions using the command <tt>chmod</tt> in conjunction with the octal notation discussed above, so for example
== Best practices == <!--T:9-->
{{Command|chmod 777 name_of_file}}
* Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
means that everyone on the machine now has the right to read, write and execute this file. Naturally you can only modify the permissions of a file or directory you own and you can also alter the owner and group by means of the commands <tt>chown</tt> and <tt>chgrp</tt> respectively.
* Only use text format for files that are smaller than a few megabytes.
* As far as possible, use scratch and local storage for temporary files. For local storage you can use the temporary directory created by the [[Running jobs|job scheduler]] for this, named <code>$SLURM_TMPDIR</code>.
* If your program must search within a file, it is fastest to do it by first reading it completely before searching.
* If you no longer use certain files but they must be retained, [[Archiving and compressing files|archive and compress]] them, and if possible move them to an alternative location like [[Using nearline storage|nearline]].
* For more on managing many files, see [[Handling large collections of files]], especially if you are limited by a quota on the number of files.
* Having any sort of parallel write access to a file stored on a shared filesystem like home, scratch and project is likely to create problems unless you are using a specialized tool such as [https://en.wikipedia.org/wiki/Message_Passing_Interface#I/O MPI-IO].
* If your needs are not well served by the available storage options please contact [[technical support]].


The file permissions discussed above have been available in Unix-like operating systems for decades now but they are very coarse-grained since the whole set of users is divided into just three categories: the owner, a group and everyone else. What if I want to allow a single user who isn't in my group to read a file? Do I really need to make the file readable by everyone in that case? Fortunately, the machines at Compute Canada offer what are called "access control lists" (ACLs) to enable extended permissions that are much more fine-grained and can be set on a user-by-user basis if desired. The two commands needed to manipulate these extended permissions are <tt>getfacl</tt> and <tt>setfacl</tt> to see and alter the ACL permissions respectively. If I want to allow a single person with username smithj to have read and execute permission on the file my_script.py I can achieve this with the command
==Filesystem quotas and policies== <!--T:10-->
{{Command|setfacl -m u:smithj:rx my_script.py}}


===File Ageing===
<!--T:11-->
In order to ensure that there is adequate space for all users, there are a variety of quotas and policy restrictions concerning backups and automatic purging of certain filesystems.
By default on our clusters, each user has access to the home and scratch spaces, and each group has access to 1 TB of project space. Small increases in project and scratch spaces are available through our Rapid Access Service ([https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/rapid-access-service RAS]). Larger increases in project spaces are available through the annual Resource Allocation Competition ([https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition RAC]). You can see your current quota usage for various filesystems on Cedar and Graham using the command [[Storage and file management#Overview|<code>diskusage_report</code>]].


== Lustre filesystem ==
<!--T:12-->
<tabs>
<tab name="Cedar">
{| class="wikitable" style="font-size: 95%; text-align: center;"
|+Filesystem Characteristics
! Filesystem
! Default Quota
! Lustre-based
! Backed up
! Purged
! Available by Default
! Mounted on Compute Nodes
|-
|Home Space
|50 GB and 500K files per user<ref>This quota is fixed and cannot be changed.</ref>
|Yes
|Yes
|No
|Yes
|Yes
|-
|Scratch Space
|20 TB and 1M files per user
|Yes
|No
|Files older than 60 days are purged.<ref>See [[Scratch purging policy]] for more information.</ref>
|Yes
|Yes
|-
|Project Space
|1 TB and 500K files per group<ref>Project space can be increased to 40 TB per group by a RAS request, subject to the limitations that the minimum project space per quota cannot be less than 1 TB and the sum over all four general-purpose clusters cannot exceed 43 TB. The group's sponsoring PI should write to [[technical support]] to make the request.</ref>
|Yes
|Yes
|No
|Yes
|Yes
|-
|Nearline Space
|2 TB and 5000 files per group
|Yes
|Yes
|No
|Yes
|No
|}
<references />
Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for the project and nearline spaces. For more details, see the "Storage" section at [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/rapid-access-service Rapid Access Service].  Quota changes larger than those permitted by RAS will require an application to the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition (RAC)].
</tab>
<tab name="Graham">
{| class="wikitable" style="font-size: 95%; text-align: center;"
|+Filesystem Characteristics
! Filesystem
! Default Quota
! Lustre-based
! Backed up
! Purged
! Available by Default
! Mounted on Compute Nodes
|-
|Home Space
|50 GB and 500K files per user<ref>This quota is fixed and cannot be changed.</ref>
|No
|Yes
|No
|Yes
|Yes
|-
|Scratch Space
|20 TB and 1M files per user
|Yes
|No
|Files older than 60 days are purged.<ref>See [[Scratch purging policy]] for more information.</ref>
|Yes
|Yes
|-
|Project Space
|1 TB and 500K files per group<ref>Project space can be increased to 40 TB per group by a RAS request. The group's sponsoring PI should write to [[technical support]] to make the request.</ref>
|Yes
|Yes
|No
|Yes
|Yes
|-
|Nearline Space
|10 TB and 5000 files per group
|Yes
|Yes
|No
|Yes
|No
|}
<references />
Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for project and nearline spaces. For more details, see the "Storage" section at [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/rapid-access-service Rapid Access Service].  Quota changes larger than those permitted by RAS will require an application to the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition (RAC)]. 
</tab>
<tab name="Béluga and Narval">
{| class="wikitable" style="font-size: 95%; text-align: center;"
|+Filesystem Characteristics
! Filesystem
! Default Quota
! Lustre-based
! Backed up
! Purged
! Available by Default
! Mounted on Compute Nodes
|-
|Home Space
|50 GB and 500K files per user<ref>This quota is fixed and cannot be changed.</ref>
|Yes
|Yes
|No
|Yes
|Yes
|-
|Scratch Space
|20 TB and 1M files per user
|Yes
|No
|Files older than 60 days are purged.<ref>See [[Scratch purging policy]] for more information.</ref>
|Yes
|Yes
|-
|Project Space
|1 TB and 500K files per group<ref>Project space can be increased to 40 TB per group by a RAS request. The group's sponsoring PI should write to [[technical support]] to make the request.</ref>
|Yes
|Yes
|No
|Yes
|Yes
|-
|Nearline Space
|1 TB and 5000 files per group
|Yes
|Yes
|No
|Yes
|No
|}
<references />
Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for project and nearline spaces. For more details, see the "Storage" section at [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/rapid-access-service Rapid Access Service].  Quota changes larger than those permitted by RAS will require an application to the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition (RAC)]. 
</tab>
<tab name="Niagara">
{| class="wikitable"
! location
!colspan="2"| quota
!align="right"| block size
! expiration time
! backed up
! on login nodes
! on compute nodes
|-
| $HOME
|colspan="2"| 100 GB per user
|align="right"| 1 MB
|
| yes
| yes
| read-only
|-
|rowspan="6"| $SCRATCH
|colspan="2"| 25 TB per user (dynamic per group)
|align="right" rowspan="6" | 16 MB
|rowspan="6"| 2 months
|rowspan="6"| no
|rowspan="6"| yes
|rowspan="6"| yes
|-
|align="right"|up to 4 users per group
|align="right"|50TB
|-
|align="right"|up to 11 users per group
|align="right"|125TB
|-
|align="right"|up to 28 users per group
|align="right"|250TB
|-
|align="right"|up to 60 users per group
|align="right"|400TB
|-
|align="right"|above 60 users per group
|align="right"|500TB
|-
| $PROJECT
|colspan="2"| by group allocation (RRG or RPP)
|align="right"| 16 MB
|
| yes
| yes
| yes
|-
| $ARCHIVE
|colspan="2"| by group allocation
|align="right"|
|
| dual-copy
| no
| no
|-
| $BBUFFER
|colspan="2"| 10 TB per user
|align="right"| 1 MB
| very short
| no
| yes
| yes
|}
<ul>
<li>[https://docs.scinet.utoronto.ca/images/9/9a/Inode_vs._Space_quota_-_v2x.pdf Inode vs. Space quota (PROJECT and SCRATCH)]</li>
<li>[https://docs.scinet.utoronto.ca/images/0/0e/Scratch-quota.pdf dynamic quota per group (SCRATCH)]</li>
<li>Compute nodes do not have local storage.</li>
<li>Archive (a.k.a. nearline) space is on [https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]</li>
<li>Backup means a recent snapshot, not an archive of all data that ever was.</li>
<li><code>$BBUFFER</code> stands for [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer], a faster parallel storage tier for temporary data.</li></ul>


''Lustre'' is a high performance distributed filesystem which allows users of Compute Canada to reach high bandwidth for input/output operations. There are however some caveat to be taken care of if one wants to reach the best bandwidth.
<!--T:21-->
</tab>
</tabs>


=== Stripe count and stripe size ===
<!--T:22-->
The backup policy on the home and project space is nightly backups which are retained for 30 days, while deleted files are retained for a further 60 days; note that is entirely distinct from the age limit for purging files from the scratch space. If you wish to recover a previous version of a file or directory, you should contact [[technical support]] with the full path for the file(s) and desired version (by date).


For each file or directory, it is possible change the stripe size and stripe count parameters. Stripe size is the size of the smallest block of data that is allocated on the filesystem. Stripe count is the number of disks on which the data are spread.
== See also == <!--T:13-->


It is possible to get the value of those parameters for a given file or directory using the command
<!--T:14-->
{{Command|lfs getstripe ''path/to/file''}}
* [[Diskusage Explorer]]
 
* [[Project layout]]
It is also possible to change those parameters for a given directory using the command
* [[Sharing data]]
{{Command|lfs setstripe -c ''count'' -s ''size'' ''/path/to/dir''}}
* [[Transferring data]]
 
* [[Tuning Lustre]]
For example, if ''count''=8 and ''size''=4m, then the files will be spread on 8 disks and will grow by steps of 4 MB each time that new space is required.
* [[Archiving and compressing files]]
 
* [[Handling large collections of files]]
It is not possible to change the stripe count or the stripe size of an existing file. To change those parameters, the file must be '''copied''' (not moved) to a directory with different parameters. To create an empty file with a given value of those parameters without changing the parameters of the directory, you may run ''lfs setstripe'' on the name of the file to be created. The file will be created as an empty file with the given parameters.
* [[Parallel I/O introductory tutorial]]
 
</translate>
Increasing the stripe count may improve performances, but also makes this file more susceptible to hardware failures.
 
When a parallel program needs to read a small file (< 1MB), a configuration file for example, it is best to put this file on one disk (stripe count=1), to read it with the master rank, and to send its content to other ranks using a <tt>MPI_Broadcast</tt> or <tt>MPI_Scatter</tt>.
 
When treating large files, it is usually best to use a stripe count as large as the number of MPI ranks. For the stripe size, you will want it to be the same size as the buffer size for the data that is being read or written, by each rank. For example, if each rank reads 1 MB of data at a time, the ideal stripe size will likely be 1 MB. If you don't know what size to use, your best bet is to keep the default value, which has been optimized for large files. '''Note that you must never use a stripe size that is not a multiple of 1 MB'''.
 
In general, you want to reduce the number of open/close operations on the filesystem. It is therefore best to concatenate all data within a single file rather than writing a lot of small files. It will also be best to open the file once at the beginning, and close it once at the end of the program, rather than opening and closing it each time you want to add new data.
 
== See also ==
 
* http://www.nics.tennessee.edu/io-tips : explanations on Lustre
* http://www.nics.tennessee.edu/I-O-Best-Practices : advices to obtain better performances
* Tools and examples for [[Archiving and compressing files]] files

Latest revision as of 14:46, 24 October 2024

Other languages:

Overview[edit]

We provide a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the filesystems on our systems are a shared resource and for this reason should be used responsibly because unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, which are typically binary since very large (hundreds of MB or more) text files lose most of their interest in being readable by humans. You should therefore avoid storing tens of thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like tar or zip to convert a directory containing many small files into a single very large archive file.

It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and one of our systems or between two of our systems should generally be done using Globus.

Note that our storage systems are not for personal use and should only be used to store research data.

When your account is created on a cluster, your home directory will not be entirely empty. It will contain references to your scratch and project spaces through the mechanism of a symbolic link, a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is shared by a research group. This group may consist of those individuals with an account sponsored by a particular faculty member or members of an RAC allocation. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder projects within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role, it is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.

All users can check the available disk space and the current disk utilization for the project, home and scratch filesystems with the command line utility diskusage_report, available on our clusters. To use this utility, log into the cluster using SSH, at the command prompt type diskusage_report, and press the Enter key. Below is a typical output of this utility:

# diskusage_report
                   Description                Space           # of files
                 Home (username)         280 kB/47 GB              25/500k
              Scratch (username)         4096 B/18 TB              1/1000k
       Project (def-username-ab)       4096 B/9536 GB              2/500k
          Project (def-username)       4096 B/9536 GB              2/500k

More detailed output is available using the Diskusage Explorer tool.

Storage types[edit]

Unlike your personal computer, our systems will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most of our systems and the intended use of each one along with some of its characteristics.

  • HOME: While your home directory may seem like the logical place to store all your files and do all your work, in general this isn't the case; your home normally has a relatively small quota and doesn't have especially good performance for writing and reading large amounts of data. The most logical use of your home directory is typically source code, small parameter files and job submission scripts.
  • PROJECT: The project space has a significantly larger quota and is well adapted to sharing data among members of a research group since it, unlike the home or scratch, is linked to a professor's account rather than an individual user. The data stored in the project space should be fairly static, that is to say the data are not likely to be changed many times in a month. Otherwise, frequently changing data, including just moving and renaming directories, in project can become a heavy burden on the tape-based backup system.
  • SCRATCH: For intensive read/write operations on large files (> 100 MB per file), scratch is the best choice. However, remember that important files must be copied off scratch since they are not backed up there, and older files are subject to purging. The scratch storage should therefore be used for temporary files: checkpoint files, output from jobs and other data that can easily be recreated. Do not regard SCRATCH as your normal storage! It is for transient files that you can afford to lose.
  • SLURM_TMPDIR: While a job is running, the environment variable $SLURM_TMPDIR holds a unique path to a temporary folder on a fast, local filesystem on each compute node allocated to the job. When the job ends, the directory and its contents are deleted, so $SLURM_TMPDIR should be used for temporary files that are only needed for the duration of the job. Its advantage, compared to the other networked filesystem types above, is increased performance due to the filesystem being local to the compute node. It is especially well-suited for large collections of small files (for example, smaller than a few megabytes per file). Note that this filesystem is shared between all jobs running on the node, and that the available space depends on the compute node type. A more detailed discussion of using $SLURM_TMPDIR is available at this page.

Project space consumption per user[edit]

While the command diskusage_report gives the space and file count usage per user on home and scratch, it shows the total quota of the group on project. It includes all the files from each member of the group. Since the files that belong to a user could however be anywhere in the project space, it is difficult to obtain correct figures per user and per given project in case a user has access to more than one project. However, users can obtain an estimate of their space and file count use on the entire project space by running the command

lfs quota -u $USER /project

In addition to that, users can obtain an estimate for the number of files in a given directory (and its subdirectories) using the command lfs find, e.g.

lfs find <path to the directory> -type f | wc -l

Best practices[edit]

  • Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
  • Only use text format for files that are smaller than a few megabytes.
  • As far as possible, use scratch and local storage for temporary files. For local storage you can use the temporary directory created by the job scheduler for this, named $SLURM_TMPDIR.
  • If your program must search within a file, it is fastest to do it by first reading it completely before searching.
  • If you no longer use certain files but they must be retained, archive and compress them, and if possible move them to an alternative location like nearline.
  • For more on managing many files, see Handling large collections of files, especially if you are limited by a quota on the number of files.
  • Having any sort of parallel write access to a file stored on a shared filesystem like home, scratch and project is likely to create problems unless you are using a specialized tool such as MPI-IO.
  • If your needs are not well served by the available storage options please contact technical support.

Filesystem quotas and policies[edit]

In order to ensure that there is adequate space for all users, there are a variety of quotas and policy restrictions concerning backups and automatic purging of certain filesystems. By default on our clusters, each user has access to the home and scratch spaces, and each group has access to 1 TB of project space. Small increases in project and scratch spaces are available through our Rapid Access Service (RAS). Larger increases in project spaces are available through the annual Resource Allocation Competition (RAC). You can see your current quota usage for various filesystems on Cedar and Graham using the command diskusage_report.

Filesystem Characteristics
Filesystem Default Quota Lustre-based Backed up Purged Available by Default Mounted on Compute Nodes
Home Space 50 GB and 500K files per user[1] Yes Yes No Yes Yes
Scratch Space 20 TB and 1M files per user Yes No Files older than 60 days are purged.[2] Yes Yes
Project Space 1 TB and 500K files per group[3] Yes Yes No Yes Yes
Nearline Space 2 TB and 5000 files per group Yes Yes No Yes No
  1. This quota is fixed and cannot be changed.
  2. See Scratch purging policy for more information.
  3. Project space can be increased to 40 TB per group by a RAS request, subject to the limitations that the minimum project space per quota cannot be less than 1 TB and the sum over all four general-purpose clusters cannot exceed 43 TB. The group's sponsoring PI should write to technical support to make the request.

Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for the project and nearline spaces. For more details, see the "Storage" section at Rapid Access Service. Quota changes larger than those permitted by RAS will require an application to the annual Resource Allocation Competition (RAC).

Filesystem Characteristics
Filesystem Default Quota Lustre-based Backed up Purged Available by Default Mounted on Compute Nodes
Home Space 50 GB and 500K files per user[1] No Yes No Yes Yes
Scratch Space 20 TB and 1M files per user Yes No Files older than 60 days are purged.[2] Yes Yes
Project Space 1 TB and 500K files per group[3] Yes Yes No Yes Yes
Nearline Space 10 TB and 5000 files per group Yes Yes No Yes No
  1. This quota is fixed and cannot be changed.
  2. See Scratch purging policy for more information.
  3. Project space can be increased to 40 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.

Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for project and nearline spaces. For more details, see the "Storage" section at Rapid Access Service. Quota changes larger than those permitted by RAS will require an application to the annual Resource Allocation Competition (RAC).

Filesystem Characteristics
Filesystem Default Quota Lustre-based Backed up Purged Available by Default Mounted on Compute Nodes
Home Space 50 GB and 500K files per user[1] Yes Yes No Yes Yes
Scratch Space 20 TB and 1M files per user Yes No Files older than 60 days are purged.[2] Yes Yes
Project Space 1 TB and 500K files per group[3] Yes Yes No Yes Yes
Nearline Space 1 TB and 5000 files per group Yes Yes No Yes No
  1. This quota is fixed and cannot be changed.
  2. See Scratch purging policy for more information.
  3. Project space can be increased to 40 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.

Starting April 1, 2024, new Rapid Access Service (RAS) policies will allow larger quotas for project and nearline spaces. For more details, see the "Storage" section at Rapid Access Service. Quota changes larger than those permitted by RAS will require an application to the annual Resource Allocation Competition (RAC).

location quota block size expiration time backed up on login nodes on compute nodes
$HOME 100 GB per user 1 MB yes yes read-only
$SCRATCH 25 TB per user (dynamic per group) 16 MB 2 months no yes yes
up to 4 users per group 50TB
up to 11 users per group 125TB
up to 28 users per group 250TB
up to 60 users per group 400TB
above 60 users per group 500TB
$PROJECT by group allocation (RRG or RPP) 16 MB yes yes yes
$ARCHIVE by group allocation dual-copy no no
$BBUFFER 10 TB per user 1 MB very short no yes yes

The backup policy on the home and project space is nightly backups which are retained for 30 days, while deleted files are retained for a further 60 days; note that is entirely distinct from the age limit for purging files from the scratch space. If you wish to recover a previous version of a file or directory, you should contact technical support with the full path for the file(s) and desired version (by date).

See also[edit]