Storage and file management: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
Line 13: Line 13:
Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.  
Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.  


When you login to a Compute Canada system you will normally begin in your home directory (<tt>$HOME</tt>) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code.  
When you login to a Compute Canada system you will normally begin in your home directory (<tt>$HOME</tt>) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code. The scratch filesystem (<tt>$SCRATCH</tt>) is as the name suggests intended for temporary storage of intermediate results, checkpoint files and so on. The other filesystems available on Compute Canada servers are allocated through the annual resource competition.


===Quotas===
{| class="wikitable"
 
|+Filesystem Characteristics
The home directory (<tt>$HOME</tt>) is restricted to 50 GB of space and 500,000 files per user. It is mainly intended for the storage of source code, parameter files, job submission scripts and so forth. Research groups which require significant amounts of persistent storage may also request space in the project directory (<tt>$PROJECT</tt>), which offers 10 TB of disk space and 5 million files per research group with a further limitation of 500,000 files per user. The scratch filesystem (<tt>$SCRATCH</tt>) is as the name suggests intended for temporary storage of intermediate results, checkpoint files and so on. It offers 20 TB of space per user, up to 100 TB for a research group, with 1 million files per user and 10 million per research group. 
! Filesystem
 
! Quotas
===Backup===
! Backed up?
 
! Purged?
The home and project directories are backed up.
|-
 
|<tt>$HOME</tt>
===File Ageing===
|50 GB, 500K files
 
|Yes
The scratch filesystem is subjected to periodic purging and any file older than three months (90 days) is a candidate for purging.
|No
|-
|<tt>$SCRATCH</tt>
|20 TB and 1000K files per user, 100 TB and 10M files per group
|No
|Yes, all files older than 90 days
|-
|<tt>$PROJECT</tt>
|10 TB and 5M files per group, 500K files per user
|Yes
|No
|-
|<tt>$NEARLINE</tt>
|TBD
|No
|No
|}


== See also ==
== See also ==

Revision as of 19:09, 14 February 2017


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Overview

The filesystems on Compute Canada systems are with just a few exceptions a shared resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like tar or zip to convert a directory containing many small files into a single very large archive file.

It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus.

Note that Compute Canada storage systems are not for personal use and should only be used to store research data.

Filesystem Layout

Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.

When you login to a Compute Canada system you will normally begin in your home directory ($HOME) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code. The scratch filesystem ($SCRATCH) is as the name suggests intended for temporary storage of intermediate results, checkpoint files and so on. The other filesystems available on Compute Canada servers are allocated through the annual resource competition.

Filesystem Characteristics
Filesystem Quotas Backed up? Purged?
$HOME 50 GB, 500K files Yes No
$SCRATCH 20 TB and 1000K files per user, 100 TB and 10M files per group No Yes, all files older than 90 days
$PROJECT 10 TB and 5M files per group, 500K files per user Yes No
$NEARLINE TBD No No

See also