Project layout
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
The project filesystem on Cedar and Graham is organized on the basis of groups though with an easy user-based interface. The normal method to access the project space is by means of symbolic links which exist in your home directory. This will have the form $HOME/projects/group_name along with another symbolic link $HOME/project that points to the project directory for your default group (for those users who belong to more than one group).
The permissions on the group space are such that it is owned by the principal investigator (PI) for this group members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example
[name@server ~]$ mkdir $HOME/project/group_writable
followed by
[name@server ~]$ setfacl -d -m g::rwx $HOME/project/group_writable
For more on sharing data, file ownership, and access control lists (ACLs), see Sharing data.
The project space is subject to a default quota of 1 TB and five million files per group and which can be increased up to 10 TB of space upon request to Compute Canada support. Certain groups may have been awarded significantly higher quotas through the annual resource allocation competition. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.
To check current usage and available disk space on all file systems, run:
[name@server ~]$ diskusage_report
An explanatory example[edit]
Imagine that we have a PI (“Sue”) who has a sponsored user under her (“Bob”). Both Sue and Bob start with a directory structure that on the surface looks similar:
/home/sue/scratch
(symbolic link)/home/sue/projects
(directory)/home/sue/project
(symbolic link)/home/bob/scratch
(symbolic link)/home/bob/projects
(directory)/home/bob/project
(symbolic link)
The scratch link points to a different location for Sue (/scratch/sue
) and Bob (/scratch/bob
).
If Bob's only role was the one sponsored by Sue, the project
directory and projects
directory would be the same for Sue and for Bob. Further, if neither Sue nor Bob have any other roles or projects with Compute Canada, then each one's projects
directory would just contain one subdirectory, def-sue
:
/home/sue/projects/def-sue /home/bob/projects/def-sue
Each of /home/sue/project
, /home/bob/project
, /home/sue/projects/def-sue
, and /home/bob/projects/def-sue
would point to the same location, /project/<some random number>
. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do
$ cd ~/project $ mkdir foo
and Bob could then copy a file into the directory ~/project/foo
, where it will be visible to both of them.
If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective projects
directories, something like
~/projects/rrg-sue-ab
They should use this directory to store and share data related to the research carried out under the RAC award.
For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See Sharing data for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for any directory or file, not just those in project
--- you could share a directory in your scratch
or home
with Heather if you wanted to or in one of the subdirectories of projects
, if you have several (a default one, one for a RAC, etc.).
One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so have read and execute permission. Note that I've assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Compute Canada account using a Globus shared endpoint.
Of course if Heather is pursuing a serious collaboration with Sue then it may make sense for Sue to sponsor a role for Heather, thereby giving Heather access similar to Bob's, described earlier.
To summarize:
scratch
space is for (private) temporary fileshome
space is normally for small amounts of relatively private data (e.g. a job script),- Shared data for a research group should normally go in that group's
project
space, as it is persistent, backed-up, and fairly large (up to 10 TB, or more with a RAC).