Project layout: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(32 intermediate revisions by 8 users not shown)
Line 1: Line 1:
<languages />
<languages />
<translate>
<translate>
<!--T:18-->
: <i>Parent page: [[Storage and file management]]</i>
: <i>See also: [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]</i>
<!--T:1-->
<!--T:1-->
The project filesystem on [[Cedar]] and [[Graham]] is organized on the basis of ''groups'' though with an easy user-based interface. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form <tt>$HOME/projects/group_name</tt> along with another symbolic link <tt>$HOME/project</tt> that points to the project directory for your default group (for those users who belong to more than one group).  
The project filesystem on [[Béluga]], [[Cedar]], [[Graham]], and [[Narval]] is organized on the basis of <i>groups</i>. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form <tt>$HOME/projects/group_name</tt>.  


<!--T:2-->
<!--T:2-->
The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example
The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example
{{Command|mkdir $HOME/project/group_writable}}
{{Command|mkdir $HOME/projects/def-profname/group_writable}}
followed by
followed by
{{Command|setfacl -d -m g::rwx $HOME/project/group_writable}}
{{Command|setfacl -d -m g::rwx $HOME/projects/def-profname/group_writable}}


<!--T:3-->
<!--T:3-->
Line 14: Line 18:


<!--T:4-->
<!--T:4-->
The project space is subject to a default quota of 1 TB and five million files per group and which can be increased up to 10 TB of space upon request to [mailto:support@computecanada.ca Compute Canada support]. Certain groups may have been awarded significantly higher quotas through the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ resource allocation competition]. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.  
The project space is subject to a default quota of 1 TB and 500,000 files per group and which can be increased up to 10 TB of space upon request to [[Technical support]]. Certain groups may have been awarded significantly higher quotas through the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition]. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.  


<!--T:5-->
<!--T:5-->
To check current usage and available disk space on all file systems, run:
To check current usage and available disk space, use
{{Command|diskusage_report}}
{{Command|diskusage_report}}
<!--T:16-->
In order to ensure that files which are copied or moved to a given project space acquire the appropriate group membership - and thus are counted against the expected quota - it can be useful to set the <code>setgid</code> bit on the directory in question. This will have the effect of ensuring that every new file and subdirectory created below the directory will inherit the same group as the ambient directory; equally so, new subdirectories will also possess this same <code>setgid</code> bit. However, existing files and subdirectories will not have their group membership changed - this should be done with the <code>chgrp</code> command - and any files moved to the directory will also continue to retain their existing group membership. You can set the <code>setgid</code> bit on a directory with the command
{{Command|chmod g+s <directory name>}}
If you want to apply this command to the existing subdirectories of a directory, you can use the command
{{Command|find <directory name> -type d -print0 {{!}} xargs -0 chmod g+s}}
More information on the <code>setgid</code> is available from this [https://en.wikipedia.org/wiki/Setuid#setuid_and_setgid_on_directories page]. 
<!--T:17-->
You can also use the command <code>newgrp</code> to modify your default group during an interactive session, for example
{{Command|newgrp rrg-profname-ab}}
and then to copy any data to the appropriate project directory. This will only change your default group for this particular session however - at your next login you will need to reuse the <code>newgrp</code> command if you wish to change the default group again.
<!--T:19-->
Note that if you are getting <i>disk quota exceeded</i> error messages (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]), this may well be due to files being associated with the wrong group, notably your personal group, i.e. the one with the same name as your username and which has a quota of only 2 MB. To find and fix the group membership of such files you can use the command
find <directory name> -group $USER -print0 | xargs -0 chgrp -h <group>
where <code><group></code> is something like <code>def-profname</code>, thus a group with a reasonable quota of a terabyte or more.


=== An explanatory example === <!--T:6-->
=== An explanatory example === <!--T:6-->
Line 29: Line 50:
* <code>/home/sue/scratch</code> (symbolic link)
* <code>/home/sue/scratch</code> (symbolic link)
* <code>/home/sue/projects</code> (directory)
* <code>/home/sue/projects</code> (directory)
* <code>/home/sue/project</code> (symbolic link)
* <code>/home/bob/scratch</code> (symbolic link)
* <code>/home/bob/scratch</code> (symbolic link)
* <code>/home/bob/projects</code> (directory)
* <code>/home/bob/projects</code> (directory)
* <code>/home/bob/project</code> (symbolic link)
</div>
</div>


Line 39: Line 58:


<!--T:10-->
<!--T:10-->
If Bob's only role was the one sponsored by Sue, then Bob's <code>project</code> would be the same as Sue's <code>project</code>, and Bob's <code>projects</code>would be the same as Sue's <code>projects</code>.  Further, if neither Sue nor Bob have any other roles or projects with Compute Canada, then each one's <code>projects</code> directory would just contain one subdirectory, <code>def-sue</code>, pointing to the same place as each one's <code>project</code>.
If Bob's only role was the one sponsored by Sue, then Bob's <code>projects</code> directory would have the same contents as Sue's <code>projects</code> directory.  Further, if neither Sue nor Bob have any other roles or projects with Alliance, then each one's <code>projects</code> directory would just contain one subdirectory, <code>def-sue</code>.


<!--T:11-->
<!--T:11-->
Each of <code>/home/sue/project</code>, <code>/home/bob/project</code>, <code>/home/sue/projects/def-sue</code>, and <code>/home/bob/projects/def-sue</code> would point to the same location, <code>/project/<some random number></code>. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do
Each of <code>/home/sue/projects/def-sue</code> and <code>/home/bob/projects/def-sue</code> would point to the same location, <code>/project/<some random number></code>. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do
  $ cd ~/project
  $ cd ~/projects/def-sue
  $ mkdir foo
  $ mkdir foo
and Bob could then copy a file into the directory <code>~/project/foo</code>, where it will be visible to both of them.
and Bob could then copy a file into the directory <code>~/projects/def-sue/foo</code>, where it will be visible to both of them.


<!--T:12-->
<!--T:12-->
If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective <code>projects</code> directories, something like
If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective <code>projects</code> directory, something like
  ~/projects/rrg-sue-ab
  ~/projects/rrg-sue-ab
They should use this directory to store and share data related to the research carried out under the RAC award.
They should use this directory to store and share data related to the research carried out under the RAC award.
   
   
For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See [https://docs.computecanada.ca/wiki/Sharing_data Sharing data] for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in <code>project</code>--- you could share a directory in your <code>scratch</code> too, or in one of the subdirectories of <code>projects</code>, if you have several (a default one, one for a RAC, ''etc.''). (Exception: ACLs are not available in [[Graham|Graham's]] <code>/home</code>. Best practice is to restrict file sharing to <code>/project</code> and <code>/scratch</code>.)
For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See [[Sharing data]] for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in your <code>project</code> space --- you could share a directory in your <code>scratch</code> too, or just a particular subdirectory of <code>projects</code>, if you have several (a default one, one for a RAC, <i>etc.</i>). Best practice is to restrict file sharing to <code>/project</code> and <code>/scratch</code>.)


<!--T:13-->
<!--T:13-->
One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between <code>/project</code> and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Compute Canada account using a [https://docs.computecanada.ca/wiki/Globus#Globus_Sharing Globus shared endpoint].
One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between <code>~/projects/def-sue</code> and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Alliance account using a [[Globus#Globus_Sharing | Globus shared endpoint]].


<!--T:14-->
<!--T:14-->
Line 64: Line 83:
* <code>scratch</code> space is for (private) temporary files
* <code>scratch</code> space is for (private) temporary files
* <code>home</code> space is normally for small amounts of relatively private data (e.g. a job script),
* <code>home</code> space is normally for small amounts of relatively private data (e.g. a job script),
* Shared data for a research group should normally go in that group's <code>project</code> space, as it is persistent, backed-up, and fairly large (up to 10 TB, or more with a RAC).
* Shared data for a research group should normally go in that group's <code>project</code> space, as it is persistent, backed up, and fairly large (up to 10 TB, or more with a RAC).
</translate>
</translate>

Latest revision as of 21:10, 27 April 2023

Other languages:
Parent page: Storage and file management
See also: Disk quota exceeded error on /project filesystems

The project filesystem on Béluga, Cedar, Graham, and Narval is organized on the basis of groups. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form $HOME/projects/group_name.

The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example

Question.png
[name@server ~]$ mkdir $HOME/projects/def-profname/group_writable

followed by

Question.png
[name@server ~]$ setfacl -d -m g::rwx $HOME/projects/def-profname/group_writable

For more on sharing data, file ownership, and access control lists (ACLs), see Sharing data.

The project space is subject to a default quota of 1 TB and 500,000 files per group and which can be increased up to 10 TB of space upon request to Technical support. Certain groups may have been awarded significantly higher quotas through the annual Resource Allocation Competition. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.

To check current usage and available disk space, use

Question.png
[name@server ~]$ diskusage_report

In order to ensure that files which are copied or moved to a given project space acquire the appropriate group membership - and thus are counted against the expected quota - it can be useful to set the setgid bit on the directory in question. This will have the effect of ensuring that every new file and subdirectory created below the directory will inherit the same group as the ambient directory; equally so, new subdirectories will also possess this same setgid bit. However, existing files and subdirectories will not have their group membership changed - this should be done with the chgrp command - and any files moved to the directory will also continue to retain their existing group membership. You can set the setgid bit on a directory with the command

Question.png
[name@server ~]$ chmod g+s <directory name>

If you want to apply this command to the existing subdirectories of a directory, you can use the command

Question.png
[name@server ~]$ find <directory name> -type d -print0 | xargs -0 chmod g+s

More information on the setgid is available from this page.

You can also use the command newgrp to modify your default group during an interactive session, for example

Question.png
[name@server ~]$ newgrp rrg-profname-ab

and then to copy any data to the appropriate project directory. This will only change your default group for this particular session however - at your next login you will need to reuse the newgrp command if you wish to change the default group again.

Note that if you are getting disk quota exceeded error messages (see Disk quota exceeded error on /project filesystems), this may well be due to files being associated with the wrong group, notably your personal group, i.e. the one with the same name as your username and which has a quota of only 2 MB. To find and fix the group membership of such files you can use the command

find <directory name> -group $USER -print0 | xargs -0 chgrp -h <group>

where <group> is something like def-profname, thus a group with a reasonable quota of a terabyte or more.

An explanatory example[edit]

Imagine that we have a PI (“Sue”) who has a sponsored user under her (“Bob”). Both Sue and Bob start with a directory structure that on the surface looks similar:

  • /home/sue/scratch (symbolic link)
  • /home/sue/projects (directory)
  • /home/bob/scratch (symbolic link)
  • /home/bob/projects (directory)

The scratch link points to a different location for Sue (/scratch/sue) and Bob (/scratch/bob).

If Bob's only role was the one sponsored by Sue, then Bob's projects directory would have the same contents as Sue's projects directory. Further, if neither Sue nor Bob have any other roles or projects with Alliance, then each one's projects directory would just contain one subdirectory, def-sue.

Each of /home/sue/projects/def-sue and /home/bob/projects/def-sue would point to the same location, /project/<some random number>. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do

$ cd ~/projects/def-sue
$ mkdir foo

and Bob could then copy a file into the directory ~/projects/def-sue/foo, where it will be visible to both of them.

If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective projects directory, something like

~/projects/rrg-sue-ab

They should use this directory to store and share data related to the research carried out under the RAC award.

For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See Sharing data for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in your project space --- you could share a directory in your scratch too, or just a particular subdirectory of projects, if you have several (a default one, one for a RAC, etc.). Best practice is to restrict file sharing to /project and /scratch.)

One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between ~/projects/def-sue and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Alliance account using a Globus shared endpoint.

If Heather is pursuing a serious and ongoing collaboration with Sue then it may naturally make sense for Sue to sponsor a role for Heather, thereby giving Heather access similar to Bob's, described earlier.

To summarize:

  • scratch space is for (private) temporary files
  • home space is normally for small amounts of relatively private data (e.g. a job script),
  • Shared data for a research group should normally go in that group's project space, as it is persistent, backed up, and fairly large (up to 10 TB, or more with a RAC).