Data management at Niagara

Understanding the various file systems, and how to use them properly, is critical to optimizing your workflow and being a good SciNet citizen. This page describes the various Niagara file systems, and how to properly use them.

Performance

The file systems on SciNet, with the exception of archive, are GPFS, a high-performance file system which provides rapid reads and writes to large datasets in parallel from many nodes. As a consequence of this design, however, the file system performs quite poorly at accessing data sets which consist of many, small files. For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.

For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously. Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.

Purpose of each file system

Niagara accesses several different file systems. Note that not all of these file systems are available to all users.

/home

/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.

/scratch

/scratch is to be used primarily for temporary or transient files, for all the results of your computations and simulations, or any material that can be easily recreated or reacquired. You may use scratch as well for any intermediate step in your workflow, provided it does not induce too much IO or too many small files on this disk-based storage pool, otherwise you should consider burst buffer (/bb). Once you have your final results, those that you want to keep for the long term, you may migrate them to /project or /archive. /scratch is purged on a regular basis and has no backups.

/project

/project is intended for common group software, large static datasets, or any material very costly to be reacquired or re-generated by the group. Material on /project is expected to be relatively immutable over time. Temporary or transient files should be kept on scratch, not project. High data turnover induces the consumption of a lot of tapes on the TSM backup system, long after this material has been deleted, due to backup retention policies and the extra versions kept of the same file. Users abusing the project file system and using it as scratch will be flagged and contacted. Note that on niagara /project is only available to groups with RAC allocation.

/bb (burst buffer)

/bb is basically a very fast, very high performance alternative to /scratch, made of solid-state drives (SSD). You may request this resource instead, if you anticipate a lot of IO/IOPs (too much for scratch) or when you notice your job is not performing well running on scratch or project because of IO bottlenecks. Keep in mind, we can only offer 232TB for all niagara users at any given time. Once you get your results you may bundle/tarball them and move to scratch, project or archive. /bb is purged very frequently.

/archive

/archive is a nearline storage pool, if you want to temporarily offload semi-active material from any of the above file systems. In practice users will offload/recall material as part of their regular workflow, or when they hit their quotas on scratch or project. That material can remain on HPSS for a few months to a few years. Note that on niagara /archive is only available to groups with RAC allocation.

Quotas and purging

You should familiarize yourself with the various file systems, what purpose they serve, and how to properly use them. This table summarizes the various file systems.

location	quota		block size	expiration time	backed up	on login nodes	on compute nodes
$HOME	100 GB per user		1 MB		yes	yes	read-only
$SCRATCH	25 TB per user provided group quota is not reached		16 MB	2 months	no	yes	yes
	groups of up to 4 users	50TB for the group
	groups of up to 11 users	125TB for the group
	groups of up to 28 users	250TB for the group
	groups of up to 60 users	400TB for the group
	groups with over 60 users	500TB for the group
$PROJECT	by group allocation		16 MB		yes	yes	yes
$ARCHIVE	by group allocation				dual-copy	no	no
$BBUFFER	10 TB per user		1 MB	very short	no	yes	yes

Inode vs. Space quota (PROJECT and SCRATCH)
dynamic quota per group (SCRATCH)
Compute nodes do not have local storage.
Archive space is on HPSS
Backup means a recent snapshot, not an achive of all data that ever was.
$BBUFFER stands for Burst Buffer, a faster parallel storage tier for temporary data.

Moving data

using rsync/scp

Move amounts less than 10GB through the login nodes.

Niagara login nodes and datamovers are visible from outside SciNet.
Use scp or rsync to niagara.scinet.utoronto.ca or niagara.computecanada.ca (no difference).
This will time out for amounts larger than about 10GB.

Move amounts larger than 10GB through the datamover nodes.

From a Niagara login node, ssh to nia-datamover1 or nia-datamover2.
Transfers must originate from this datamover.
The other side (e.g. your machine) must be reachable from the outside.
You may also login/scp/rsync directly to the datamovers from the outside:

 nia-datamover1.scinet.utoronto.ca
 nia-datamover2.scinet.utoronto.ca

If you do this often, consider using Globus, a web-based tool for data transfer.

Moving data to HPSS/Archive/Nearline using the scheduler.

HPSS is a tape-based storage solution, and is SciNet's nearline a.k.a. archive facility.
Storage space on HPSS is allocated through the annual Compute Canada RAC allocation.

using globus

If you regularly move more than 10GB, consider using globus, a web-based data transfer tool.

Please check the comprehensive documentation here and here..

The Niagara "endpoint" for globus is "computecanada#niagara"

File/Ownership Management (ACL)

By default, at SciNet, users within the same group already have read permission to each other's files (not write)
You may use access control list (ACL) to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories. You may also let users in other groups or whole other groups access (read, execute) your files using this same mechanism.

To allow [supervisor] to manage files in /project/g/group/[owner] using setfacl and getfacl commands, follow the 3-steps below as the [owner] account from a shell:

1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/g/group/[owner]
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)

<!--T:126-->
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/g/group/[owner]
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])

<!--T:127-->
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/g/group/[owner]
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])

   <!--T:128-->
$ /scinet/gpc/bin/getfacl /project/g/group/[owner]
   (to determine the current ACL attributes)

   <!--T:129-->
$ /scinet/gpc/bin/setfacl -b /project/g/group/[owner]
   (to remove any previously set ACL)

<!--T:130-->
PS: on the datamovers getfacl, setfacl and chacl will be on your path

For more information on using setfacl or getfacl see their man pages.

-->

Using mmputacl/mmgetacl

You may use gpfs' native mmputacl and mmgetacl commands. The advantages are that you can set "control" permission and that POSIX or NFS v4 style ACL are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:

user::rwxc
group::----
other::----
mask::rwxc
user:[owner]:rwxc
user:[supervisor]:rwxc
group:[othegroup]:r-xc

Then issue the following 2 commands:

1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner]
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner]
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as 
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])

   <!--T:133-->
$ mmgetacl /project/g/group/[owner]
   (to determine the current ACL attributes)

   <!--T:134-->
$ mmdelacl -d /project/g/group/[owner]
   (to remove any previously set ACL)

   <!--T:135-->
$ mmeditacl /project/g/group/[owner]
   (to create or change a GPFS access control list)
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)

NOTES:

There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. Here is a sample bash script you may use for that purpose]

mmputacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the "#effective:r-x" note you may see from time to time with mmgetacf. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs.

In the case of PROJECT, your group's supervisor will need to set proper ACL to the /project/G/GROUP level in order to let users from other groups access your files.

ACL's won't let you give away permissions to files or directories that do not belong to you.

We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.

For more information on using mmputacl or mmgetacl see their man pages.

Recursive ACL script

You may use/adapt this sample bash script to recursively add or remove ACL attributes using gpfs built-in commands

Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)