cc_staff
290
edits
No edit summary |
|||
Line 101: | Line 101: | ||
<!--T:38--> | <!--T:38--> | ||
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future. | When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future. | ||
== Storage and quotas == <!--T:39--> | |||
[[Data_Management | Data Management]] | |||
<!--T:40--> | |||
{| class="wikitable" | |||
! location | |||
!colspan="2"| quota | |||
!align="right"| block size | |||
! expiration time | |||
! backed up | |||
! on login nodes | |||
! on compute nodes | |||
|- | |||
| $HOME | |||
|colspan="2"| 100 GB per user | |||
|align="right"| 1 MB | |||
| | |||
| yes | |||
| yes | |||
| read-only | |||
|- | |||
|rowspan="6"| $SCRATCH | |||
|colspan="2"| 25 TB per user (dynamic per group) | |||
|align="right" rowspan="6" | 16 MB | |||
|rowspan="6"| 2 months | |||
|rowspan="6"| no | |||
|rowspan="6"| yes | |||
|rowspan="6"| yes | |||
|- | |||
|align="right"|up to 4 users per group | |||
|align="right"|50TB | |||
|- | |||
|align="right"|up to 11 users per group | |||
|align="right"|125TB | |||
|- | |||
|align="right"|up to 28 users per group | |||
|align="right"|250TB | |||
|- | |||
|align="right"|up to 60 users per group | |||
|align="right"|400TB | |||
|- | |||
|align="right"|above 60 users per group | |||
|align="right"|500TB | |||
|- | |||
| $PROJECT | |||
|colspan="2"| by group allocation | |||
|align="right"| 16 MB | |||
| | |||
| yes | |||
| yes | |||
| yes | |||
|- | |||
| $ARCHIVE | |||
|colspan="2"| by group allocation | |||
|align="right"| | |||
| | |||
| dual-copy | |||
| no | |||
| no | |||
|- | |||
| $BBUFFER | |||
|colspan="2"| 10 TB per user | |||
|align="right"| 1 MB | |||
| very short | |||
| no | |||
| yes | |||
| yes | |||
|} | |||
<!--T:41--> | |||
<ul> | |||
<li>[https://docs.scinet.utoronto.ca/images/9/9a/Inode_vs._Space_quota_-_v2x.pdf Inode vs. Space quota (PROJECT and SCRATCH)]</li> | |||
<li>[https://docs.scinet.utoronto.ca/images/0/0e/Scratch-quota.pdf dynamic quota per group (SCRATCH)]</li> | |||
<li>Compute nodes do not have local storage.</li> | |||
<li>Archive space is on [https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]</li> | |||
<li>Backup means a recent snapshot, not an achive of all data that ever was.</li> | |||
<li><p><code>$BBUFFER</code> stands for [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer], a faster parallel storage tier for temporary data.</p></li></ul> | |||
==File/Ownership Management (ACL)== <!--T:124--> | |||
* By default, at SciNet, users within the same group already have read permission to each other's files (not write) | |||
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories. You may also let users in other groups or whole other groups access (read, execute) your files using this same mechanism. | |||
<!-- | |||
===Using setfacl/getfacl=== <!--T:125--> | |||
* To allow [supervisor] to manage files in /project/g/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell: | |||
<pre> | |||
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/g/group/[owner] | |||
(every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on) | |||
<!--T:126--> | |||
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/g/group/[owner] | |||
(but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor]) | |||
<!--T:127--> | |||
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/g/group/[owner] | |||
(recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor]) | |||
<!--T:128--> | |||
$ /scinet/gpc/bin/getfacl /project/g/group/[owner] | |||
(to determine the current ACL attributes) | |||
<!--T:129--> | |||
$ /scinet/gpc/bin/setfacl -b /project/g/group/[owner] | |||
(to remove any previously set ACL) | |||
<!--T:130--> | |||
PS: on the datamovers getfacl, setfacl and chacl will be on your path | |||
</pre> | |||
For more information on using [http://linux.die.net/man/1/setfacl <tt>setfacl</tt>] or [http://linux.die.net/man/1/getfacl <tt>getfacl</tt>] see their man pages. | |||
--> | |||
===Using mmputacl/mmgetacl=== <!--T:131--> | |||
* You may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set "control" permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents: | |||
<pre> | |||
user::rwxc | |||
group::---- | |||
other::---- | |||
mask::rwxc | |||
user:[owner]:rwxc | |||
user:[supervisor]:rwxc | |||
group:[othegroup]:r-xc | |||
</pre> | |||
<!--T:132--> | |||
Then issue the following 2 commands: | |||
<pre> | |||
1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner] | |||
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner] | |||
(every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as | |||
[owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor]) | |||
<!--T:133--> | |||
$ mmgetacl /project/g/group/[owner] | |||
(to determine the current ACL attributes) | |||
<!--T:134--> | |||
$ mmdelacl -d /project/g/group/[owner] | |||
(to remove any previously set ACL) | |||
<!--T:135--> | |||
$ mmeditacl /project/g/group/[owner] | |||
(to create or change a GPFS access control list) | |||
(for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi) | |||
</pre> | |||
<!--T:136--> | |||
NOTES: | |||
* There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. [https://docs.scinet.utoronto.ca/index.php/Recursive_ACL_script Here is a sample bash script you may use for that purpose]] | |||
<!--T:137--> | |||
* mmputacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the "#effective:r-x" note you may see from time to time with mmgetacf. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs. | |||
<!--T:138--> | |||
* In the case of PROJECT, your group's supervisor will need to set proper ACL to the /project/G/GROUP level in order to let users from other groups access your files. | |||
<!--T:139--> | |||
* ACL's won't let you give away permissions to files or directories that do not belong to you. | |||
<!--T:140--> | |||
* We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those. | |||
<!--T:141--> | |||
For more information on using [https://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs100.doc/bl1adm_mmputacl.htm <tt>mmputacl</tt>] or [https://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs100.doc/bl1adm_mmgetacl.htm <tt>mmgetacl</tt>] see their man pages. | |||
===Recursive ACL script === <!--T:142--> | |||
You may use/adapt '''[https://docs.scinet.utoronto.ca/index.php/Recursive_ACL_script this sample bash script]''' to recursively add or remove ACL attributes using gpfs built-in commands | |||
<!--T:143--> | |||
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL) | |||
==Scratch Disk Purging Policy== <!--T:144--> | |||
<!--T:145--> | |||
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 2 months by the actual deletion day on the 15th of each month'''. Note that we recently changed the cut out reference to the ''MostRecentOf(atime,ctime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to more permanent locations such as your departmental server or your /project space or into HPSS (for PIs who have either been allocated storage space by the RAC on project or HPSS). | |||
<!--T:146--> | |||
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. You also get a notification on the shell every time your login to Niagara. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -1 /scratch/t/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space: | |||
<!--T:147--> | |||
<pre> | |||
[xxyz@nia-login03 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz | |||
-rw-r----- 1 xxyz root 1733059 Jan 17 11:46 3110001___xxyz_______abc_________1.00T_____9560files | |||
</pre> | |||
<!--T:148--> | |||
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/t/todelete/current/3110001___xxyz_______abc_________1.00T_____9560files''' | |||
<!--T:149--> | |||
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents. | |||
<!--T:150--> | |||
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted. | |||
==How much Disk Space Do I have left?== <!--T:151--> | |||
<!--T:152--> | |||
The <tt>'''/scinet/niagara/bin/diskUsage'''</tt> command, available on the login nodes and datamovers, provides information in a number of ways on the home, scratch, project and archive file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period ("delta information") or you may generate plots of your usage over time. Please see the usage help below for more details. | |||
<pre> | |||
Usage: diskUsage [-h|-?| [-a] [-u <user>] | |||
-h|-?: help | |||
-a: list usages of all members on the group | |||
-u <user>: as another user on your group | |||
</pre> | |||
<!--T:153--> | |||
Did you know that you can check which of your directories have more than 1000 files with the <tt>'''/scinet/niagara/bin/topUserDirOver1000list'''</tt> command and which have more than 1GB of material with the <tt>'''/scinet/niagara/bin/topUserDirOver1GBlist'''</tt> command? | |||
<!--T:154--> | |||
Note: | |||
* information on usage and quota is only updated every 3 hours! | |||
== I/O Tips == <!--T:95--> | |||
<!--T:96--> | |||
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS. | |||
* Your files can be seen on all Niagara login and compute nodes. | |||
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes. | |||
* But accessing data sets which consist of many, small files leads to poor performance. | |||
* Avoid reading and writing lots of small amounts of data to disk.<br /> | |||
<!--T:97--> | |||
* Many small files on the system would waste space and would be slower to access, read and write. | |||
* Write data out in binary. Faster and takes less space. | |||
* The [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer] is better for i/o heavy jobs and to speed up checkpoints | |||
= Loading software modules = <!--T:48--> | = Loading software modules = <!--T:48--> |