Data management at Niagara: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 235: Line 235:
<!--T:143-->
<!--T:143-->
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)
==Scratch Disk Purging Policy== <!--T:144-->
<!--T:145-->
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 2 months by the actual deletion day on the 15th of each month'''. Note that we recently changed the cut out reference to the ''MostRecentOf(atime,ctime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to more permanent locations such as your departmental server or your /project space or into HPSS (for PIs who have either been allocated storage space by the RAC on project or HPSS).
<!--T:146-->
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. You also get a notification on the shell every time your login to Niagara. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -1 /scratch/t/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:
<!--T:147-->
<pre>
[xxyz@nia-login03 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz
-rw-r----- 1 xxyz    root      1733059 Jan 17 11:46 3110001___xxyz_______abc_________1.00T_____9560files
</pre>
<!--T:148-->
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/t/todelete/current/3110001___xxyz_______abc_________1.00T_____9560files'''
<!--T:149-->
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.
<!--T:150-->
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.
==How much Disk Space Do I have left?== <!--T:151-->
<!--T:152-->
The <tt>'''/scinet/niagara/bin/diskUsage'''</tt> command, available on the login nodes and datamovers, provides information in a number of ways on the home, scratch, project and archive file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period ("delta information") or you may generate plots of your usage over time. Please see the usage help below for more details.
<pre>
Usage: diskUsage [-h|-?| [-a] [-u <user>]
      -h|-?: help
      -a: list usages of all members on the group
      -u <user>: as another user on your group
</pre>
<!--T:153-->
Did you know that you can check which of your directories have more than 1000 files with the <tt>'''/scinet/niagara/bin/topUserDirOver1000list'''</tt> command and which have more than 1GB of material with the <tt>'''/scinet/niagara/bin/topUserDirOver1GBlist'''</tt> command?
<!--T:154-->
Note:
* information on usage and quota is only updated every 3 hours!
== I/O Tips == <!--T:95-->
<!--T:96-->
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.
* Your files can be seen on all Niagara login and compute nodes.
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.
* But accessing data sets which consist of many, small files leads to poor performance.
* Avoid reading and writing lots of small amounts of data to disk.<br />
<!--T:97-->
* Many small files on the system would waste space and would be slower to access, read and write.
* Write data out in binary. Faster and takes less space.
* The [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer] is better for i/o heavy jobs and to speed up checkpoints
cc_staff
290

edits