Project layout: Difference between revisions
No edit summary |
No edit summary |
||
(32 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
<!--T:18--> | |||
: <i>Parent page: [[Storage and file management]]</i> | |||
: <i>See also: [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]</i> | |||
<!--T:1--> | <!--T:1--> | ||
The project filesystem on [[Cedar]] and [[ | The project filesystem on [[Béluga]], [[Cedar]], [[Graham]], and [[Narval]] is organized on the basis of <i>groups</i>. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form <tt>$HOME/projects/group_name</tt>. | ||
<!--T:2--> | <!--T:2--> | ||
The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example | The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example | ||
{{Command|mkdir $HOME/ | {{Command|mkdir $HOME/projects/def-profname/group_writable}} | ||
followed by | followed by | ||
{{Command|setfacl -d -m g::rwx $HOME/ | {{Command|setfacl -d -m g::rwx $HOME/projects/def-profname/group_writable}} | ||
<!--T:3--> | <!--T:3--> | ||
Line 14: | Line 18: | ||
<!--T:4--> | <!--T:4--> | ||
The project space is subject to a default quota of 1 TB and | The project space is subject to a default quota of 1 TB and 500,000 files per group and which can be increased up to 10 TB of space upon request to [[Technical support]]. Certain groups may have been awarded significantly higher quotas through the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition]. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster. | ||
<!--T:5--> | <!--T:5--> | ||
To check current usage and available disk space | To check current usage and available disk space, use | ||
{{Command|diskusage_report}} | {{Command|diskusage_report}} | ||
<!--T:16--> | |||
In order to ensure that files which are copied or moved to a given project space acquire the appropriate group membership - and thus are counted against the expected quota - it can be useful to set the <code>setgid</code> bit on the directory in question. This will have the effect of ensuring that every new file and subdirectory created below the directory will inherit the same group as the ambient directory; equally so, new subdirectories will also possess this same <code>setgid</code> bit. However, existing files and subdirectories will not have their group membership changed - this should be done with the <code>chgrp</code> command - and any files moved to the directory will also continue to retain their existing group membership. You can set the <code>setgid</code> bit on a directory with the command | |||
{{Command|chmod g+s <directory name>}} | |||
If you want to apply this command to the existing subdirectories of a directory, you can use the command | |||
{{Command|find <directory name> -type d -print0 {{!}} xargs -0 chmod g+s}} | |||
More information on the <code>setgid</code> is available from this [https://en.wikipedia.org/wiki/Setuid#setuid_and_setgid_on_directories page]. | |||
<!--T:17--> | |||
You can also use the command <code>newgrp</code> to modify your default group during an interactive session, for example | |||
{{Command|newgrp rrg-profname-ab}} | |||
and then to copy any data to the appropriate project directory. This will only change your default group for this particular session however - at your next login you will need to reuse the <code>newgrp</code> command if you wish to change the default group again. | |||
<!--T:19--> | |||
Note that if you are getting <i>disk quota exceeded</i> error messages (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]), this may well be due to files being associated with the wrong group, notably your personal group, i.e. the one with the same name as your username and which has a quota of only 2 MB. To find and fix the group membership of such files you can use the command | |||
find <directory name> -group $USER -print0 | xargs -0 chgrp -h <group> | |||
where <code><group></code> is something like <code>def-profname</code>, thus a group with a reasonable quota of a terabyte or more. | |||
=== An explanatory example === <!--T:6--> | === An explanatory example === <!--T:6--> | ||
Line 29: | Line 50: | ||
* <code>/home/sue/scratch</code> (symbolic link) | * <code>/home/sue/scratch</code> (symbolic link) | ||
* <code>/home/sue/projects</code> (directory) | * <code>/home/sue/projects</code> (directory) | ||
* <code>/home/bob/scratch</code> (symbolic link) | * <code>/home/bob/scratch</code> (symbolic link) | ||
* <code>/home/bob/projects</code> (directory) | * <code>/home/bob/projects</code> (directory) | ||
</div> | </div> | ||
Line 39: | Line 58: | ||
<!--T:10--> | <!--T:10--> | ||
If Bob's only role was the one sponsored by Sue, then | If Bob's only role was the one sponsored by Sue, then Bob's <code>projects</code> directory would have the same contents as Sue's <code>projects</code> directory. Further, if neither Sue nor Bob have any other roles or projects with Alliance, then each one's <code>projects</code> directory would just contain one subdirectory, <code>def-sue</code>. | ||
<!--T:11--> | <!--T:11--> | ||
Each of | Each of <code>/home/sue/projects/def-sue</code> and <code>/home/bob/projects/def-sue</code> would point to the same location, <code>/project/<some random number></code>. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do | ||
$ cd ~/ | $ cd ~/projects/def-sue | ||
$ mkdir foo | $ mkdir foo | ||
and Bob could then copy a file into the directory <code>~/ | and Bob could then copy a file into the directory <code>~/projects/def-sue/foo</code>, where it will be visible to both of them. | ||
<!--T:12--> | <!--T:12--> | ||
If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective <code>projects</code> | If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective <code>projects</code> directory, something like | ||
~/projects/rrg-sue-ab | ~/projects/rrg-sue-ab | ||
They should use this directory to store and share data related to the research carried out under the RAC award. | They should use this directory to store and share data related to the research carried out under the RAC award. | ||
For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See [ | For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See [[Sharing data]] for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in your <code>project</code> space --- you could share a directory in your <code>scratch</code> too, or just a particular subdirectory of <code>projects</code>, if you have several (a default one, one for a RAC, <i>etc.</i>). Best practice is to restrict file sharing to <code>/project</code> and <code>/scratch</code>.) | ||
<!--T:13--> | <!--T:13--> | ||
One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between <code>/ | One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between <code>~/projects/def-sue</code> and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Alliance account using a [[Globus#Globus_Sharing | Globus shared endpoint]]. | ||
<!--T:14--> | <!--T:14--> | ||
Line 64: | Line 83: | ||
* <code>scratch</code> space is for (private) temporary files | * <code>scratch</code> space is for (private) temporary files | ||
* <code>home</code> space is normally for small amounts of relatively private data (e.g. a job script), | * <code>home</code> space is normally for small amounts of relatively private data (e.g. a job script), | ||
* Shared data for a research group should normally go in that group's <code>project</code> space, as it is persistent, backed | * Shared data for a research group should normally go in that group's <code>project</code> space, as it is persistent, backed up, and fairly large (up to 10 TB, or more with a RAC). | ||
</translate> | </translate> |
Latest revision as of 21:10, 27 April 2023
- Parent page: Storage and file management
- See also: Disk quota exceeded error on /project filesystems
The project filesystem on Béluga, Cedar, Graham, and Narval is organized on the basis of groups. The normal method to access the project space is by means of symbolic links which exist in your home directory. These will have the form $HOME/projects/group_name.
The permissions on the group space are such that it is owned by the principal investigator (PI) for this group and members have read and write permission on this directory. However by default a newly created file will only be readable by group members. If the group wishes to have writeable files, the best approach is to create a special directory for that, for example
[name@server ~]$ mkdir $HOME/projects/def-profname/group_writable
followed by
[name@server ~]$ setfacl -d -m g::rwx $HOME/projects/def-profname/group_writable
For more on sharing data, file ownership, and access control lists (ACLs), see Sharing data.
The project space is subject to a default quota of 1 TB and 500,000 files per group and which can be increased up to 10 TB of space upon request to Technical support. Certain groups may have been awarded significantly higher quotas through the annual Resource Allocation Competition. In this case, you will already have been notified of your group's quota for the coming year. Note that this storage allocation is specific to a particular cluster and cannot normally be transferred to another cluster.
To check current usage and available disk space, use
[name@server ~]$ diskusage_report
In order to ensure that files which are copied or moved to a given project space acquire the appropriate group membership - and thus are counted against the expected quota - it can be useful to set the setgid
bit on the directory in question. This will have the effect of ensuring that every new file and subdirectory created below the directory will inherit the same group as the ambient directory; equally so, new subdirectories will also possess this same setgid
bit. However, existing files and subdirectories will not have their group membership changed - this should be done with the chgrp
command - and any files moved to the directory will also continue to retain their existing group membership. You can set the setgid
bit on a directory with the command
[name@server ~]$ chmod g+s <directory name>
If you want to apply this command to the existing subdirectories of a directory, you can use the command
[name@server ~]$ find <directory name> -type d -print0 | xargs -0 chmod g+s
More information on the setgid
is available from this page.
You can also use the command newgrp
to modify your default group during an interactive session, for example
[name@server ~]$ newgrp rrg-profname-ab
and then to copy any data to the appropriate project directory. This will only change your default group for this particular session however - at your next login you will need to reuse the newgrp
command if you wish to change the default group again.
Note that if you are getting disk quota exceeded error messages (see Disk quota exceeded error on /project filesystems), this may well be due to files being associated with the wrong group, notably your personal group, i.e. the one with the same name as your username and which has a quota of only 2 MB. To find and fix the group membership of such files you can use the command
find <directory name> -group $USER -print0 | xargs -0 chgrp -h <group>
where <group>
is something like def-profname
, thus a group with a reasonable quota of a terabyte or more.
An explanatory example[edit]
Imagine that we have a PI (“Sue”) who has a sponsored user under her (“Bob”). Both Sue and Bob start with a directory structure that on the surface looks similar:
/home/sue/scratch
(symbolic link)/home/sue/projects
(directory)/home/bob/scratch
(symbolic link)/home/bob/projects
(directory)
The scratch link points to a different location for Sue (/scratch/sue
) and Bob (/scratch/bob
).
If Bob's only role was the one sponsored by Sue, then Bob's projects
directory would have the same contents as Sue's projects
directory. Further, if neither Sue nor Bob have any other roles or projects with Alliance, then each one's projects
directory would just contain one subdirectory, def-sue
.
Each of /home/sue/projects/def-sue
and /home/bob/projects/def-sue
would point to the same location, /project/<some random number>
. This project directory is the best place for Sue and Bob to share data. They can both create directories in it, read it, and write to it. Sue for instance could do
$ cd ~/projects/def-sue $ mkdir foo
and Bob could then copy a file into the directory ~/projects/def-sue/foo
, where it will be visible to both of them.
If Sue were to get a RAC award with storage (as is often the case these days), both she and Bob would find that there is a new entry in their respective projects
directory, something like
~/projects/rrg-sue-ab
They should use this directory to store and share data related to the research carried out under the RAC award.
For sharing data with someone who doesn't have a role sponsored by Sue--- let's say Heather--- the simplest thing to do is to change the file permissions so that Heather can read a particular directory or file. See Sharing data for more details. The best idea is usually to use ACLs to let Heather read a directory. Note that these filesystem permissions can be changed for almost any directory or file, not just those in your project
space --- you could share a directory in your scratch
too, or just a particular subdirectory of projects
, if you have several (a default one, one for a RAC, etc.). Best practice is to restrict file sharing to /project
and /scratch
.)
One thing to keep in mind when sharing a directory is that Heather will need to be able to descend the entire filesystem structure down to this directory and so she will need to have read and execute permission on each of the directories between ~/projects/def-sue
and the directory containing the file(s) to be shared. We have implicitly assumed here that Heather has an account on the cluster but you can even share with researchers who don't have a Alliance account using a Globus shared endpoint.
If Heather is pursuing a serious and ongoing collaboration with Sue then it may naturally make sense for Sue to sponsor a role for Heather, thereby giving Heather access similar to Bob's, described earlier.
To summarize:
scratch
space is for (private) temporary fileshome
space is normally for small amounts of relatively private data (e.g. a job script),- Shared data for a research group should normally go in that group's
project
space, as it is persistent, backed up, and fairly large (up to 10 TB, or more with a RAC).