Storage and file management: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
{{Draft}}
{{Draft}}
==Overview==
The filesystems on Compute Canada systems are with just a few exceptions a ''shared'' resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users.
These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like <tt>tar</tt> or <tt>zip</tt> to convert a directory containing many small files into a single very large archive file.
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using [[Globus]].
Note that Compute Canada storage systems are not for personal use and should only be used to store research data.
 
==Filesystem Layout==
==Filesystem Layout==
Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.
When you login to a Compute Canada system you will normally begin in your home directory (<tt>$HOME</tt>) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code.


===Quotas===
===Quotas===


===Backup===
===Backup===
===Permissions===
==File Permissions==
Like most modern filesystems, those used on the servers of Compute Canada include the idea of permissions to read, write and execute files and directories. When you attempt to read, modify or delete file or descend into a directory the Linux kernel first verifies that you have the right to do this and if not, you'll see the error message "Permission denied". For every given filesystem object, there are three classes of users: the object's owner (normally the user who created the file or directory), the owner's group and finally everyone else. Each of these different user classes can have the right to read, write or execute a filesystem object, so that all told there are nine permissions associated with each such object.
You can see what the current permissions are for a filesystem object with the command
{{Command|ls -l name_of_object}}
which will print out the permissions for the owner, the group and everyone else, for example <tt>-rw-r--r--</tt> for a file that the owner can read and write but not execute and for which everyone else can only read. You'll also see printed out the name of the object's owner and the group. It's common for people to use "octal notation" when referring to Unix filesystem permissions. In this case, we use three bits to represent the permissions for each category of user, with these three bits then interpreted as a number from 0 to 7 using the formula (read_bit)*2^2 + (write_bit)*2^1 + (execute_bit)*2^0. The file permissions given in the above example would have the octal representation 1*2^2 + 1*2^1 + 0*2^0 = 6 for the owner and 1*2^2 + 0*2^1 + 0*2^0 = 4 for the group and everyone else, so 644 overall. Note that to be able to exercise your rights on a file, you also need to be able to access the directory in which it resides, which means having both read and execute permission (or "5" in octal notation) on the directory in question.
You can alter these permissions using the command <tt>chmod</tt> in conjunction with the octal notation discussed above, so for example
{{Command|chmod 777 name_of_file}}
means that everyone on the machine now has the right to read, write and execute this file. Naturally you can only modify the permissions of a file or directory you own and you can also alter the owner and group by means of the commands <tt>chown</tt> and <tt>chgrp</tt> respectively. 
The file permissions discussed above have been available in Unix-like operating systems for decades now but they are very coarse-grained since the whole set of users is divided into just three categories: the owner, a group and everyone else. What if I want to allow a single user who isn't in my group to read a file? Do I really need to make the file readable by everyone in that case? Fortunately, the machines at Compute Canada offer what are called "access control lists" (ACLs) to enable extended permissions that are much more fine-grained and can be set on a user-by-user basis if desired. The two commands needed to manipulate these extended permissions are <tt>getfacl</tt> and <tt>setfacl</tt> to see and alter the ACL permissions respectively. If I want to allow a single person with username smithj to have read and execute permission on the file my_script.py I can achieve this with the command
{{Command|setfacl -m u:smithj:rx my_script.py}}


===File Ageing===
===File Ageing===

Revision as of 15:30, 2 February 2017


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Overview[edit]

The filesystems on Compute Canada systems are with just a few exceptions a shared resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like tar or zip to convert a directory containing many small files into a single very large archive file.

It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus.

Note that Compute Canada storage systems are not for personal use and should only be used to store research data.

Filesystem Layout[edit]

Unlike your personal computer, a Compute Canada system will typically have several filesystems and you should ensure that you are using the right filesystem for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics and restrictions.

When you login to a Compute Canada system you will normally begin in your home directory ($HOME) and which is intended to store relatively small text files such as job submission scripts, parameter files and source code.

Quotas[edit]

Backup[edit]

Permissions[edit]

File Permissions[edit]

Like most modern filesystems, those used on the servers of Compute Canada include the idea of permissions to read, write and execute files and directories. When you attempt to read, modify or delete file or descend into a directory the Linux kernel first verifies that you have the right to do this and if not, you'll see the error message "Permission denied". For every given filesystem object, there are three classes of users: the object's owner (normally the user who created the file or directory), the owner's group and finally everyone else. Each of these different user classes can have the right to read, write or execute a filesystem object, so that all told there are nine permissions associated with each such object.

You can see what the current permissions are for a filesystem object with the command

Question.png
[name@server ~]$ ls -l name_of_object

which will print out the permissions for the owner, the group and everyone else, for example -rw-r--r-- for a file that the owner can read and write but not execute and for which everyone else can only read. You'll also see printed out the name of the object's owner and the group. It's common for people to use "octal notation" when referring to Unix filesystem permissions. In this case, we use three bits to represent the permissions for each category of user, with these three bits then interpreted as a number from 0 to 7 using the formula (read_bit)*2^2 + (write_bit)*2^1 + (execute_bit)*2^0. The file permissions given in the above example would have the octal representation 1*2^2 + 1*2^1 + 0*2^0 = 6 for the owner and 1*2^2 + 0*2^1 + 0*2^0 = 4 for the group and everyone else, so 644 overall. Note that to be able to exercise your rights on a file, you also need to be able to access the directory in which it resides, which means having both read and execute permission (or "5" in octal notation) on the directory in question.

You can alter these permissions using the command chmod in conjunction with the octal notation discussed above, so for example

Question.png
[name@server ~]$ chmod 777 name_of_file

means that everyone on the machine now has the right to read, write and execute this file. Naturally you can only modify the permissions of a file or directory you own and you can also alter the owner and group by means of the commands chown and chgrp respectively.

The file permissions discussed above have been available in Unix-like operating systems for decades now but they are very coarse-grained since the whole set of users is divided into just three categories: the owner, a group and everyone else. What if I want to allow a single user who isn't in my group to read a file? Do I really need to make the file readable by everyone in that case? Fortunately, the machines at Compute Canada offer what are called "access control lists" (ACLs) to enable extended permissions that are much more fine-grained and can be set on a user-by-user basis if desired. The two commands needed to manipulate these extended permissions are getfacl and setfacl to see and alter the ACL permissions respectively. If I want to allow a single person with username smithj to have read and execute permission on the file my_script.py I can achieve this with the command

Question.png
[name@server ~]$ setfacl -m u:smithj:rx my_script.py

File Ageing[edit]

Lustre filesystem[edit]

Lustre is a high performance distributed filesystem which allows users of Compute Canada to reach high bandwidth for input/output operations. There are however some caveat to be taken care of if one wants to reach the best bandwidth.

Stripe count and stripe size[edit]

For each file or directory, it is possible change the stripe size and stripe count parameters. Stripe size is the size of the smallest block of data that is allocated on the filesystem. Stripe count is the number of disks on which the data are spread.

It is possible to get the value of those parameters for a given file or directory using the command

Question.png
[name@server ~]$ lfs getstripe ''path/to/file''

It is also possible to change those parameters for a given directory using the command

Question.png
[name@server ~]$ lfs setstripe -c ''count'' -s ''size'' ''/path/to/dir''

For example, if count=8 and size=4m, then the files will be spread on 8 disks and will grow by steps of 4 MB each time that new space is required.

It is not possible to change the stripe count or the stripe size of an existing file. To change those parameters, the file must be copied (not moved) to a directory with different parameters. To create an empty file with a given value of those parameters without changing the parameters of the directory, you may run lfs setstripe on the name of the file to be created. The file will be created as an empty file with the given parameters.

Increasing the stripe count may improve performances, but also makes this file more susceptible to hardware failures.

When a parallel program needs to read a small file (< 1MB), a configuration file for example, it is best to put this file on one disk (stripe count=1), to read it with the master rank, and to send its content to other ranks using a MPI_Broadcast or MPI_Scatter.

When treating large files, it is usually best to use a stripe count as large as the number of MPI ranks. For the stripe size, you will want it to be the same size as the buffer size for the data that is being read or written, by each rank. For example, if each rank reads 1 MB of data at a time, the ideal stripe size will likely be 1 MB. If you don't know what size to use, your best bet is to keep the default value, which has been optimized for large files. Note that you must never use a stripe size that is not a multiple of 1 MB.

In general, you want to reduce the number of open/close operations on the filesystem. It is therefore best to concatenate all data within a single file rather than writing a lot of small files. It will also be best to open the file once at the beginning, and close it once at the end of the program, rather than opening and closing it each time you want to add new data.

See also[edit]