Storage and file management: Difference between revisions

Jump to navigation Jump to search
no edit summary
(Fix links to RAS and RAC)
No edit summary
Line 4: Line 4:


<!--T:2-->
<!--T:2-->
Compute Canada provides a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the [https://en.wikipedia.org/wiki/File_system filesystems] on Compute Canada systems are a ''shared'' resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, which are typically binary since very large (hundreds of MB or more) text files lose most of their interest in being human-readable. You should therefore avoid storing tens of thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like [[Archiving and compressing files|<tt>tar</tt>]] or <tt>zip</tt> to convert a directory containing many small files into a single very large archive file.  
We provide a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the [https://en.wikipedia.org/wiki/File_system filesystems] on our systems are a ''shared'' resource and for this reason should be used responsibly because unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, which are typically binary since very large (hundreds of MB or more) text files lose most of their interest in being human-readable. You should therefore avoid storing tens of thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like [[Archiving and compressing files|<tt>tar</tt>]] or <tt>zip</tt> to convert a directory containing many small files into a single very large archive file.  


<!--T:3-->
<!--T:3-->
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using [[Globus]].  
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and one of our systems or between two of our systems should generally be done using [[Globus]].  


<!--T:4-->
<!--T:4-->
Note that Compute Canada storage systems are not for personal use and should only be used to store research data.
Note that our storage systems are not for personal use and should only be used to store research data.


<!--T:17-->
<!--T:17-->
When your account is created on a Compute Canada cluster, your home directory will not be entirely empty. It will contain references to your scratch and [[Project layout|project]] spaces through the mechanism of a [https://en.wikipedia.org/wiki/Symbolic_link symbolic link], a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is a shared by a research group. This group may consist of those individuals with a Compute Canada account sponsored by a particular faculty member or members of a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ RAC allocation]. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder <tt>projects</tt> within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.
When your account is created on a cluster, your home directory will not be entirely empty. It will contain references to your scratch and [[Project layout|project]] spaces through the mechanism of a [https://en.wikipedia.org/wiki/Symbolic_link symbolic link], a kind of shortcut that allows easy access to these other filesystems from your home directory. Note that these symbolic links may appear up to a few hours after you first connect to the cluster. While your home and scratch spaces are unique to you as an individual user, the project space is shared by a research group. This group may consist of those individuals with an account sponsored by a particular faculty member or members of a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ RAC allocation]. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has one or many projects. In the folder <tt>projects</tt> within their home directory, each user has a link to each of the projects they have access to. For users with a single active sponsored role it is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.


<!--T:16-->
<!--T:16-->
All users can check the available disk space and the current disk utilization for the ''project'', ''home'' and ''scratch'' file systems with the command line utility '''''diskusage_report''''', available on Compute Canada clusters. To use this utility, log into the cluster using SSH, at the command prompt type diskusage_report, and press the Enter key. Following is a typical output of this utility:
All users can check the available disk space and the current disk utilization for the ''project'', ''home'' and ''scratch'' file systems with the command line utility '''''diskusage_report''''', available on our clusters. To use this utility, log into the cluster using SSH, at the command prompt type diskusage_report, and press the Enter key. Following is a typical output of this utility:
<pre>
<pre>
# diskusage_report
# diskusage_report
Line 27: Line 27:


== Storage types == <!--T:5-->
== Storage types == <!--T:5-->
Unlike your personal computer, a Compute Canada system will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with some of its characteristics.  
Unlike your personal computer, our systems will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most of our systems and the intended use of each one along with some of its characteristics.  
* '''HOME:''' While your home directory may seem like the logical place to store all your files and do all your work, in general this isn't the case - your home normally has a relatively small quota and doesn't have especially good performance for the writing and reading of large amounts of data. The most logical use of your home directory is typically source code, small parameter files and job submission scripts.  
* '''HOME:''' While your home directory may seem like the logical place to store all your files and do all your work, in general this isn't the case; your home normally has a relatively small quota and doesn't have especially good performance for writing and reading large amounts of data. The most logical use of your home directory is typically source code, small parameter files and job submission scripts.  
* '''PROJECT:''' The project space has a significantly larger quota and is well-adapted to [[Sharing data | sharing data]] among members of a research group since it, unlike the home or scratch, is linked to a professor's account rather than an individual user. The data stored in the project space should be fairly static, that is to say the data are not likely to be changed many times in a month. Otherwise, frequently changing data - including just moving and renaming directories - in project can become a heavy burden on the tape-based backup system.  
* '''PROJECT:''' The project space has a significantly larger quota and is well-adapted to [[Sharing data | sharing data]] among members of a research group since it, unlike the home or scratch, is linked to a professor's account rather than an individual user. The data stored in the project space should be fairly static, that is to say the data are not likely to be changed many times in a month. Otherwise, frequently changing data, including just moving and renaming directories, in project can become a heavy burden on the tape-based backup system.  
* '''SCRATCH''': For intensive read/write operations on large files (> 100 MB per file), scratch is the best choice. Remember however that important files must be copied off scratch since they are not backed up there, and older files are subject to [[Scratch purging policy|purging]]. The scratch storage should therefore be used for temporary files: checkpoint files, output from jobs and other data that can easily be recreated.
* '''SCRATCH''': For intensive read/write operations on large files (> 100 MB per file), scratch is the best choice. Remember however that important files must be copied off scratch since they are not backed up there, and older files are subject to [[Scratch purging policy|purging]]. The scratch storage should therefore be used for temporary files: checkpoint files, output from jobs and other data that can easily be recreated.
* '''SLURM_TMPDIR''': While a job is running, <code>$SLURM_TMPDIR</code> is a unique path to a temporary folder on a local fast filesystem on each compute node reserved for the job. This is the best location to temporarily store large collections of small files (< 1 MB per file). Note: this space is shared between jobs on each node, and the total available space depends on the node specifications. Finally, when the job ends, this folder is deleted. A more detailed discussion of using <code>$SLURM_TMPDIR</code> is available at [[Using_$SLURM_TMPDIR | this page]].
* '''SLURM_TMPDIR''': While a job is running, <code>$SLURM_TMPDIR</code> is a unique path to a temporary folder on a local fast filesystem on each compute node reserved for the job. This is the best location to temporarily store large collections of small files (< 1 MB per file). Note that this space is shared between jobs on each node and the total available space depends on the node specifications. Finally, when the job ends, this folder is deleted. A more detailed discussion of using <code>$SLURM_TMPDIR</code> is available at [[Using_$SLURM_TMPDIR | this page]].


==Project space consumption per user== <!--T:23-->                                                             
==Project space consumption per user== <!--T:23-->                                                             
rsnt_translations
56,420

edits

Navigation menu