Storage and file management

From Alliance Doc
Revision as of 18:35, 10 July 2018 by Pinto (talk | contribs)
Jump to navigation Jump to search
Other languages:

Overview[edit]

Compute Canada provides a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the filesystems on Compute Canada systems are a shared resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like tar or zip to convert a directory containing many small files into a single very large archive file.

It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus.

Note that Compute Canada storage systems are not for personal use and should only be used to store research data.

When your account is created on Cedar and Graham, your home directory will not be entirely empty. It will contain references to your scratch and project spaces through the mechanism of a symbolic link, a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is a shared by a research group. This group may consist of those individuals with a Compute Canada account sponsored by a particular faculty member or members of a RAC allocation. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder projects within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.

All users can check the available disk space and the current disk utilization for the project, home and scratch file systems with the command line utility diskusage_report, available on both Cedar and Graham. To use this utility, log into Cedar or Graham using SSH, at the command prompt type diskusage_report, and press the Enter key. Following is a typical output of this utility:

# diskusage_report
                   Description                Space           # of files
                 Home (username)         280 kB/47 GB              25/500k
              Scratch (username)         4096 B/18 TB              1/1000k
       Project (def-username-ab)       4096 B/9536 GB              2/500k
          Project (def-username)       4096 B/9536 GB              2/500k
 

Storage Types[edit]

Unlike your personal computer, a Compute Canada system will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics. Storage options are distinguished by the available hardware, access mode and write system. Typically, most Compute Canada systems offer the following storage types:

Network Filesystem (NFS)
This type of storage is generally equally visible on both login and compute nodes. This is the appropriate place to put small but important files that are regularly used: source code, programs, job scripts and parameter files. This type of storage offers performance comparable to a conventional hard disk.
Parallel Filesystem (Lustre, GPFS)
This type of storage is generally equally visible on both login and compute nodes. Combining multiple disk arrays and fast servers, it offers excellent performance for large files and large input/output operations. Often two types of storage are distinguished on such systems: long term storage and temporary storage (scratch). Performance is subject to variations caused by other users.
Local Filesystem
This type of storage consists of a local hard drive attached to each compute node. Its advantage is that its performance is high because it is very rarely shared --- typically, only one user will access a local drive at a time. However, you must copy your files back to another storage medium like the scratch space or project space before your job ends because everything will be cleaned after each job.
RAM (memory) Filesystem
This is a filesystem that exists within a compute node's RAM, so its use reduces available memory for computations. Such filesystems are very fast for small files and particularly faster than other systems when file access is random. A RAM disk is always cleaned at the end of a job.

The following table summarizes the properties of these storage types.

Description of storage type
Type Accessibility Throughput Latency Longevity
Network Filesystem (NFS) All nodes Poor High Long term
Long-Term Parallel Filesystem All nodes Fair High Long term
Short-Term Parallel Filesystem All nodes Fair High Short term (periodically cleaned)
Local Filesystem Local to the node Fair Medium Very short term
Memory (RAM) Filesystem Local to the node Good Very low Very short term, cleaned after every job

Throughput describes the efficiency of the file system for large operations, such as those involving a megabyte or more per read or write.

Latency describes the efficiency of the file system for multiple small operations. Low latency is good; however, if one has a choice between a small number of large operations and a large number of small ones, it is almost always better to use a small number of large operations.

Best practices[edit]

  • Only use text format for files that are smaller than a few megabytes.
  • As far as possible, use local storage for temporary files. It is best to use the temporary directory created by the job scheduler for this, named $SLURM_TMPDIR.
  • If your program must search within a file, it is fastest to do it by first reading it completely before searching, or to use a RAM disk.
  • Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
  • If you no longer use certain files but they must be retained, archive and compress them, and if possible copy them elsewhere.
  • If your needs are not well served by the available storage options please contact technical support.

Filesystem Quotas and Policies[edit]

In order to ensure that there is adequate space for all Compute Canada users, there are a variety of quotas and policy restrictions concerning back-ups and automatic purging of certain filesystems. By default on our clusters each user has access to the home and scratch spaces, and each group has access to 1 TB of project space. Small increases in project and scratch spaces are available through our Rapid Access Service (RAS). Larger increases in project spaces are available through the annual Resource Allocation Competitions (RAC). You can see your current quota usage for various filesystems on Cedar and Graham using the command diskusage_report.

Filesystem Characteristics
Filesystem Default Quota Lustre-based? Backed up? Purged? Available by Default? Mounted on Compute Nodes?
Home Space 50 GB and 500K files per user[1] Yes for Cedar, No for Graham (NFS) Yes No Yes Yes
Scratch Space 20 TB and 1M files per user[2] Yes No Files older than 60 days are purged.[3] Yes Yes
Project Space 1 TB and 500k files per group[4] Yes Yes No Yes Yes

The backup policy on the home and project space is nightly backups which are retained for 30 days, while deleted files are retained for a further 60 days - note that is entirely distinct from the age limit for purging files from the scratch space. If you wish to recover a previous version of a file or directory, you should contact technical support with the full path for the file(s) and desired version (by date).

  1. This quota is fixed and cannot be changed.
  2. Currently scratch space cannot be increased, and an increased quota of 100 TB per user is temporarily applied on Graham. We plan to move it back down to 20TB when a technical solution has been deployed.
  3. See Scratch purging policy for more information.
  4. Project space can be increased to 10 TB per group by a RAS request. The group's sponsoring PI should write to technical support to make the request.

See also[edit]