Project layout

Revision as of 19:52, 19 September 2017 by Diane27 (talk | contribs)
Other languages:

The project filesystem on Cedar and Graham is organized on the basis of groups though with an easy user-based interface. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form $HOME/projects/group_name along with another symbolic link $HOME/project that points to the project directory for your default group (for those users who belong to more than one group).

The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example

Question.png
[name@server ~]$ mkdir $HOME/project/group_writable

followed by

Question.png
[name@server ~]$ setfacl -d -m g::rwx $HOME/project/group_writable

For more on sharing data, file ownership, and access control lists (ACLs), see Sharing data.

The project space is subject to a default quota of 1 TB and five million files per group and which can be increased up to 10 TB of space upon request to Compute Canada support. Certain groups may have been awarded significantly higher quotas through the annual resource allocation competition. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.

To check current usage and available disk space on all file systems, run:

Question.png
[name@server ~]$ diskusage_report

An explanatory example

Imagine that we have a PI (“Sue”) who has a sponsored user under her (“Bob”). Both Sue and Bob start with a directory structure that on the surface looks similar:

  • /home/sue/scratch (symbolic link)
  • /home/sue/projects (directory)
  • /home/sue/project (symbolic link)
  • /home/bob/scratch (symbolic link)
  • /home/bob/projects (directory)
  • /home/bob/project (symbolic link)

The scratch link points to a different location for Sue (/scratch/sue) and Bob (/scratch/bob).

If Bob's only role was the one sponsored by Sue, then Bob's project would be the same as Sue's project, and Bob's projectswould be the same as Sue's projects. Further, if neither Sue nor Bob have any other roles or projects with Compute Canada, then each one's projects directory would just contain one subdirectory, def-sue, pointing to the same place as each one's project.

Each of /home/sue/project, /home/bob/project, /home/sue/projects/def-sue, and /home/bob/projects/def-sue would point to the same location, /project/<some random number>. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do

$ cd ~/project
$ mkdir foo

and Bob could then copy a file into the directory ~/project/foo, where it will be visible to both of them.

If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective projects directories, something like

~/projects/rrg-sue-ab

They should use this directory to store and share data related to the research carried out under the RAC award.

For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See Sharing data for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in project--- you could share a directory in your scratch too, or in one of the subdirectories of projects, if you have several (a default one, one for a RAC, etc.). (Exception: ACLs are not available in Graham's /home. Best practice is to restrict file sharing to /project and /scratch.)

One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between /project and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Compute Canada account using a Globus shared endpoint.

If Heather is pursuing a serious and ongoing collaboration with Sue then it may naturally make sense for Sue to sponsor a role for Heather, thereby giving Heather access similar to Bob's, described earlier.

To summarize:

  • scratch space is for (private) temporary files
  • home space is normally for small amounts of relatively private data (e.g. a job script),
  • Shared data for a research group should normally go in that group's project space, as it is persistent, backed-up, and fairly large (up to 10 TB, or more with a RAC).