Stockage et gestion des fichiers

Revision as of 18:54, 4 May 2017 by Diane27 (talk | contribs) (Created page with "Il est de votre responsabilité de vérifier depuis quand vos données stockées. Le rôle de la plupart des systèmes de fichiers n'est pas d'offrir un service d'archivage à...")
Other languages:

Introduction

Calcul Canada dispose de nombreuses options de stockage capables de répondre aux besoins des utilisateurs œuvrant dans des domaines extrêmement variés. Selon vos besoins et votre usage particulier, vous avez le choix parmi différentes solutions allant du stockage long terme au stockage local temporaire à haute vitesse. Dans la plupart des cas, les systèmes de fichiers de Calcul Canada sont des ressources partagées et devraient être utilisées de manière responsable; en effet, des dizaines et même des centaines d'utilisateurs peuvent être affectés par un seul utilisateur qui se comporte de manière irréfléchie. Ces systèmes de fichiers sont conçus pour le stockage d'un nombre limité de très grands fichiers habituellement de type binaire plutôt que de type texte, qui ne sont pas directement lisibles par un être humain; pour cette raison, vous devriez éviter de stocker des milliers de petits fichiers de quelques mégaoctets, particulièrement dans le même répertoire. Une meilleure approche serait d'utiliser des commandes telles que tar ou zip pour convertir un répertoire de plusieurs petits fichiers en un très grand fichier d'archive; consultez Archiving and compressing files.

Il est de votre responsabilité de vérifier depuis quand vos données stockées. Le rôle de la plupart des systèmes de fichiers n'est pas d'offrir un service d'archivage à long terme; vous devez donc déplacer les fichiers et répertoires qui ne sont plus utilisés vers un autre endroit, que ce soit sur votre ordinateur personnel ou une autre ressource de stockage que vous contrôlez. Le transfert de grandes quantités de données se fait généralement avec Globus/fr.

Note that Compute Canada storage systems are not for personal use and should only be used to store research data.

Storage Types

Unlike your personal computer, a Compute Canada system will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics. Storage options are distinguished by the available hardware, access mode and write system. Typically, most Compute Canada systems offer the following storage types:

Network Filesystem (NFS)
This type of storage is generally equally visible on both login and compute nodes. This is the appropriate place to put small but important files that are regularly used: source code, programs, job scripts and parameter files. This type of storage offers performance comparable to a conventional hard disk.
Parallel Filesystem (Lustre, GPFS)
This type of storage is generally equally visible on both login and compute nodes. Combining multiple disk arrays and fast servers, it offers excellent performance for large files and large input/output operations. Often two types of storage are distinguished on such systems: long term storage and temporary storage (scratch). Performance is subject to variations caused by other users.
Local Filesystem
This type of storage consists of a local hard drive attached to each compute node. Its advantage is that its performance is high because it is very rarely shared --- typically, only one user will access a local drive at a time. However, you must copy your files back to another storage medium like the scratch space or project space before your job ends because everything will be cleaned after each job.
RAM (memory) Filesystem
This is a filesystem that exists within a compute node's RAM, so its use reduces available memory for computations. Such filesystems are very fast for small files and particularly faster than other systems when file access is random. A RAM disk is always cleaned at the end of a job.

The following table summarizes the properties of these storage types.

Description of storage type
Type Accessibility Throughput Latency Longevity
Network Filesystem (NFS) All nodes Poor High Long term
Long-Term Parallel Filesystem All nodes Fair High Long term
Short-Term Parallel Filesystem All nodes Fair High Short term (periodically cleaned)
Local Filesystem Local to the node Fair Medium Very short term
Memory (RAM) Filesystem Local to the node Good Very low Very short term, cleaned after every job

Throughput describes the efficiency of the file system for large operations, such as those involving a megabyte or more per read or write.

Latency describes the efficiency of the file system for multiple small operations. Low latency is good; however, if one has a choice between a small number of large operations and a large number of small ones, it is almost always better to use a small number of large operations.

Best practices

  • Only use text format for files that are smaller than a few megabytes.
  • As far as possible, use local storage for temporary files.
  • If your program must search within a file, it is fastest to do it by first reading it completely before searching, or to use a RAM disk.
  • Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
  • If you no longer use certain files but they must be retained, archive and compress them, and if possible copy them elsewhere.
  • If your needs are not well served by the available storage options please contact us by sending an e-mail to Compute Canada support.

Filesystem Quotas and Policies

In order to ensure that there is adequate space for all Compute Canada users, there are a variety of quotas and policy restrictions concerning back-ups and automatic purging of certain filesystems. Every user has access to the home and scratch spaces by default as well as a certain amount of project space. To have access to the full 10 TB quota of project space users must submit a request while the nearline space is allocated using the annual RAC (resource allocation) process, which can also have the effect of increasing a group's quote for the project and scratch spaces.

Filesystem Characteristics
Filesystem Quotas Backed up? Purged? Available by Default? Mounted on Compute Nodes?
Home Space 50 GB, 500K files Yes No Yes Yes
Scratch Space 20 TB and 1000K files per user, 100 TB and 10M files per group No Yes, all files older than a certain number of days Yes Yes
Project Space Up to 10 TB and 5M files per group, 500K files per user Yes No Yes Yes
Nearline Space 5 TB per group No No No No

See also