Storage and file management: Difference between revisions
No edit summary |
No edit summary |
||
Line 36: | Line 36: | ||
<!--T:24--> | <!--T:24--> | ||
While the command '''diskusage_report''' gives the space and inode usage per user on ''home'' and ''scratch'', it shows the total quota of the group on project. It includes all the files from each member of the group. Since the files that belong to a user could be | While the command '''diskusage_report''' gives the space and inode usage per user on ''home'' and ''scratch'', it shows the total quota of the group on project. It includes all the files from each member of the group. Since the files that belong to a user could however be anywhere in the project space, it is difficult to obtain correct figures per user and per given project in case a user has access to more than one project. However, users can obtain an estimate of their space and inode use on the entire project space by running the command, | ||
<code>lfs quota -u $USER /project</code> | <code>lfs quota -u $USER /project</code> | ||
In addition to that, | In addition to that, users can obtain an estimate for the number of files in a given directory (and its sub-directories) using the command <code>lfs find</code>, e.g. | ||
<source lang="console"> | <source lang="console"> | ||
lfs find <path to the directory> -type f | wc -l | lfs find <path to the directory> -type f | wc -l |
Revision as of 14:05, 21 October 2020
Overview[edit]
Compute Canada provides a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the filesystems on Compute Canada systems are a shared resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, which are typically binary since very large (hundreds of MB or more) text files lose most of their interest in being human-readable. You should therefore avoid storing tens of thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like tar or zip to convert a directory containing many small files into a single very large archive file.
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus.
Note that Compute Canada storage systems are not for personal use and should only be used to store research data.
When your account is created on a Compute Canada cluster, your home directory will not be entirely empty. It will contain references to your scratch and project spaces through the mechanism of a symbolic link, a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is a shared by a research group. This group may consist of those individuals with a Compute Canada account sponsored by a particular faculty member or members of a RAC allocation. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder projects within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.
All users can check the available disk space and the current disk utilization for the project, home and scratch file systems with the command line utility diskusage_report, available on Compute Canada clusters. To use this utility, log into the cluster using SSH, at the command prompt type diskusage_report, and press the Enter key. Following is a typical output of this utility:
# diskusage_report Description Space # of files Home (username) 280 kB/47 GB 25/500k Scratch (username) 4096 B/18 TB 1/1000k Project (def-username-ab) 4096 B/9536 GB 2/500k Project (def-username) 4096 B/9536 GB 2/500k
Storage types[edit]
Unlike your personal computer, a Compute Canada system will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with some of its characteristics.
- HOME: While your home directory may seem like the logical place to store all your files and do all your work, in general this isn't the case - your home normally has a relatively small quota and doesn't have especially good performance for the writing and reading of large amounts of data. The most logical use of your home directory is typically source code, small parameter files and job submission scripts.
- PROJECT: The project space has a significantly larger quota and is well-adapted to sharing data among members of a research group since it, unlike the home or scratch, is linked to a professor's account rather than an individual user. The data stored in the project space should be fairly static, that is to say the data are not likely to be changed many times in a month. Otherwise, frequently changing data - including just moving and renaming directories - in project can become a heavy burden on the tape-based backup system.
- SCRATCH: For intensive read/write operations on large files (> 100 MB per file), scratch is the best choice. Remember however that important files must be copied off scratch since they are not backed up there, and older files are subject to purging. The scratch storage should therefore be used for temporary files: checkpoint files, output from jobs and other data that can easily be recreated.
- SLURM_TMPDIR: While a job is running,
$SLURM_TMPDIR
is a unique path to a temporary folder on a local fast filesystem on each compute node reserved for the job. This is the best location to temporarily store large collections of small files (< 1 MB per file). Note: this space is shared between jobs on each node, and the total available space depends on the node specifications. Finally, when the job ends, this folder is deleted.
Breakdown of storage usage per user[edit]
While the command diskusage_report gives the space and inode usage per user on home and scratch, it shows the total quota of the group on project. It includes all the files from each member of the group. Since the files that belong to a user could however be anywhere in the project space, it is difficult to obtain correct figures per user and per given project in case a user has access to more than one project. However, users can obtain an estimate of their space and inode use on the entire project space by running the command,
lfs quota -u $USER /project
In addition to that, users can obtain an estimate for the number of files in a given directory (and its sub-directories) using the command lfs find
, e.g.
lfs find <path to the directory> -type f | wc -l
Best practices[edit]
- Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
- Only use text format for files that are smaller than a few megabytes.
- As far as possible, use scratch and local storage for temporary files. For local storage you can use the temporary directory created by the job scheduler for this, named
$SLURM_TMPDIR
. - If your program must search within a file, it is fastest to do it by first reading it completely before searching.
- If you no longer use certain files but they must be retained, archive and compress them, and if possible move them to an alternative location like nearline.
- For more notes on managing many files, see Handling large collections of files, especially if you are limited by a quota on the number of files.
- Having any sort of parallel write access to a file stored on a shared filesystem like home, scratch and project is likely to create problems unless you are using a specialized tool such as MPI-IO.
- If your needs are not well served by the available storage options please contact technical support.
Filesystem quotas and policies[edit]
In order to ensure that there is adequate space for all Compute Canada users, there are a variety of quotas and policy restrictions concerning back-ups and automatic purging of certain filesystems. By default on our clusters each user has access to the home and scratch spaces, and each group has access to 1 TB of project space. Small increases in project and scratch spaces are available through our Rapid Access Service (RAS). Larger increases in project spaces are available through the annual Resource Allocation Competitions (RAC). You can see your current quota usage for various filesystems on Cedar and Graham using the command diskusage_report.
Filesystem | Default Quota | Lustre-based? | Backed up? | Purged? | Available by Default? | Mounted on Compute Nodes? |
---|---|---|---|---|---|---|
Home Space | 50 GB and 500K files per user[1] | Yes | Yes | No | Yes | Yes |
Scratch Space | 20 TB and 1M files per user | Yes | No | Files older than 60 days are purged.[2] | Yes | Yes |
Project Space | 1 TB and 500K files per group[3] | Yes | Yes | No | Yes | Yes |
Nearline Space | 2 TB and 5000 files per group | Yes | Yes | No | Yes | No |
- ↑ This quota is fixed and cannot be changed.
- ↑ See Scratch purging policy for more information.
- ↑ Project space can be increased to 10 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.
Filesystem | Default Quota | Lustre-based? | Backed up? | Purged? | Available by Default? | Mounted on Compute Nodes? |
---|---|---|---|---|---|---|
Home Space | 50 GB and 500K files per user[1] | No | Yes | No | Yes | Yes |
Scratch Space | 20 TB and 1M files per user | Yes | No | Files older than 60 days are purged.[2] | Yes | Yes |
Project Space | 1 TB and 500K files per group[3] | Yes | Yes | No | Yes | Yes |
Nearline Space | 2 TB and 5000 files per group | Yes | Yes | No | Yes | No |
- ↑ This quota is fixed and cannot be changed.
- ↑ See Scratch purging policy for more information.
- ↑ Project space can be increased to 10 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.
Filesystem | Default Quota | Lustre-based? | Backed up? | Purged? | Available by Default? | Mounted on Compute Nodes? |
---|---|---|---|---|---|---|
Home Space | 50 GB and 500K files per user[1] | Yes | Yes | No | Yes | Yes |
Scratch Space | 20 TB and 1M files per user | Yes | No | Files older than 60 days are purged.[2] | Yes | Yes |
Project Space | 1 TB and 500K files per group[3] | Yes | Yes | No | Yes | Yes |
Nearline Space | 1 TB and 500K files per group | Yes | Yes | No | Yes | No |
- ↑ This quota is fixed and cannot be changed.
- ↑ See Scratch purging policy for more information.
- ↑ Project space can be increased to 10 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.
location | quota | block size | expiration time | backed up | on login nodes | on compute nodes | |
---|---|---|---|---|---|---|---|
$HOME | 100 GB per user | 1 MB | yes | yes | read-only | ||
$SCRATCH | 25 TB per user (dynamic per group) | 16 MB | 2 months | no | yes | yes | |
up to 4 users per group | 50TB | ||||||
up to 11 users per group | 125TB | ||||||
up to 28 users per group | 250TB | ||||||
up to 60 users per group | 400TB | ||||||
above 60 users per group | 500TB | ||||||
$PROJECT | by group allocation (RRG or RPP) | 16 MB | yes | yes | yes | ||
$ARCHIVE | by group allocation | dual-copy | no | no | |||
$BBUFFER | 10 TB per user | 1 MB | very short | no | yes | yes |
- Inode vs. Space quota (PROJECT and SCRATCH)
- dynamic quota per group (SCRATCH)
- Compute nodes do not have local storage.
- Archive(a.k.a. nearline) space is on HPSS
- Backup means a recent snapshot, not an archive of all data that ever was.
$BBUFFER
stands for Burst Buffer, a faster parallel storage tier for temporary data.
The backup policy on the home and project space is nightly backups which are retained for 30 days, while deleted files are retained for a further 60 days - note that is entirely distinct from the age limit for purging files from the scratch space. If you wish to recover a previous version of a file or directory, you should contact technical support with the full path for the file(s) and desired version (by date).