Data management at Niagara: Difference between revisions

Marked this version for translation
mNo edit summary
(Marked this version for translation)
Line 3: Line 3:
Understanding the various file systems, and how to use them properly, is critical to optimizing your workflow and being a good SciNet citizen.  This page describes the various Niagara file systems, and how to properly use them.
Understanding the various file systems, and how to use them properly, is critical to optimizing your workflow and being a good SciNet citizen.  This page describes the various Niagara file systems, and how to properly use them.


==Performance== <!--T:2-->
==Performance== <!--T:2-->
The file systems on SciNet, with the exception of archive, are [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS], a high-performance file system which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the [https://en.wikipedia.org/wiki/Block_(data_storage) blocksize] for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.
The file systems on SciNet, with the exception of archive, are [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS], a high-performance file system which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the [https://en.wikipedia.org/wiki/Block_(data_storage) blocksize] for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.


Line 16: Line 16:
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.


=== /scratch === <!--T:6-->
=== /scratch === <!--T:6-->
/scratch is to be used primarily for temporary or transient files, for all the results of your computations and simulations, or any material that can be easily recreated or reacquired. You may use scratch as well for any intermediate step in your workflow, provided it does not induce too much IO or too many small files on this disk-based storage pool, otherwise you should consider burst buffer (/bb). Once you have your final results, those that you want to keep for the long term, you may migrate them to /project or /archive. /scratch is purged on a regular basis and has no backups.
/scratch is to be used primarily for temporary or transient files, for all the results of your computations and simulations, or any material that can be easily recreated or reacquired. You may use scratch as well for any intermediate step in your workflow, provided it does not induce too much IO or too many small files on this disk-based storage pool, otherwise you should consider burst buffer (/bb). Once you have your final results, those that you want to keep for the long term, you may migrate them to /project or /archive. /scratch is purged on a regular basis and has no backups.


Line 22: Line 22:
/project is intended for common group software, large static datasets, or any material very costly to be reacquired or re-generated by the group. <font color=red>Material on /project is expected to remain relatively immutable over time.</font> Temporary or transient files should be kept on scratch, not project. High data turnover induces stress and unnecessary consumption tapes on the TSM backup system, long after this material has been deleted, due to backup retention policies and the extra versions kept of the same file. Even renaming top directories is enough to trick the system into assuming a completely new directory tree has been created, and the old one deleted, hence think carefully about your naming convention ahead of time, and stick with it. Users abusing the project file system and using it as scratch will be flagged and contacted. Note that on niagara /project is only available to groups with RAC allocation.
/project is intended for common group software, large static datasets, or any material very costly to be reacquired or re-generated by the group. <font color=red>Material on /project is expected to remain relatively immutable over time.</font> Temporary or transient files should be kept on scratch, not project. High data turnover induces stress and unnecessary consumption tapes on the TSM backup system, long after this material has been deleted, due to backup retention policies and the extra versions kept of the same file. Even renaming top directories is enough to trick the system into assuming a completely new directory tree has been created, and the old one deleted, hence think carefully about your naming convention ahead of time, and stick with it. Users abusing the project file system and using it as scratch will be flagged and contacted. Note that on niagara /project is only available to groups with RAC allocation.


=== /bb (burst buffer) === <!--T:8-->
=== /bb (burst buffer) === <!--T:8-->
/bb, the [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer burst buffer], is a very fast, very high performance alternative to /scratch, made of solid-state drives (SSD). You may request this resource if you anticipate a lot of IOPs (Input/Output Operations) or when you notice your job is not performing well running on scratch or project because of I/O (Input/Output) bottlenecks. See [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer here] for more details.
/bb, the [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer burst buffer], is a very fast, very high performance alternative to /scratch, made of solid-state drives (SSD). You may request this resource if you anticipate a lot of IOPs (Input/Output Operations) or when you notice your job is not performing well running on scratch or project because of I/O (Input/Output) bottlenecks. See [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer here] for more details.


Line 31: Line 31:
On the Niagara nodes a [https://docs.scinet.utoronto.ca/index.php/User_Ramdisk ramdisk] is available. [https://docs.scinet.utoronto.ca/index.php/User_Ramdisk Ramdisk] is much faster than real disk, and faster than Burst Buffer. Up to 70 percent of the RAM on the node (i.e. 202GB) may be used as a temporary '''local''' file system. This is particularly useful in the early stages of migrating desktop-computing codes to a HPC platform such as Niagara, especially those that use a lot of file I/O (Input/Output). Using a lot of I/O is a bottleneck in large scale computing, especially on parallel file systems (such as the GPFS used on Niagara), since the files are synchronized across the whole network.
On the Niagara nodes a [https://docs.scinet.utoronto.ca/index.php/User_Ramdisk ramdisk] is available. [https://docs.scinet.utoronto.ca/index.php/User_Ramdisk Ramdisk] is much faster than real disk, and faster than Burst Buffer. Up to 70 percent of the RAM on the node (i.e. 202GB) may be used as a temporary '''local''' file system. This is particularly useful in the early stages of migrating desktop-computing codes to a HPC platform such as Niagara, especially those that use a lot of file I/O (Input/Output). Using a lot of I/O is a bottleneck in large scale computing, especially on parallel file systems (such as the GPFS used on Niagara), since the files are synchronized across the whole network.


=== /bb/SLURM_TMPDIR (burst buffer) ===
=== /bb/SLURM_TMPDIR (burst buffer) === <!--T:47-->
For every job on Niagara, the scheduler creates a temporary directory on the [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer burst buffer] called $SLURM_TMPDIR. The $SLURM_TMPDIR directory will be empty when your jobs starts and its content gets deleted after the job has finished. $SLURM_TMPDIR is intended as a place for temporary files that do not fit in ramdisk (/dev/shm) and would suffer performance issues on the general /scratch file system. It is similar to the $SLURM_TMPDIR variable used on the general purpose Compute Canada systems Cedar and Graham, where this storage lives on a node-local ssd disk (which aren't present on Niagara nodes).
For every job on Niagara, the scheduler creates a temporary directory on the [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer burst buffer] called $SLURM_TMPDIR. The $SLURM_TMPDIR directory will be empty when your jobs starts and its content gets deleted after the job has finished. $SLURM_TMPDIR is intended as a place for temporary files that do not fit in ramdisk (/dev/shm) and would suffer performance issues on the general /scratch file system. It is similar to the $SLURM_TMPDIR variable used on the general purpose Compute Canada systems Cedar and Graham, where this storage lives on a node-local ssd disk (which aren't present on Niagara nodes).


Line 148: Line 148:
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.


= Moving data = <!--T:23-->
= Moving data = <!--T:23-->
Data for analysis and final results need to be moved to and from Niagara. There are several ways to accomplish this.  
Data for analysis and final results need to be moved to and from Niagara. There are several ways to accomplish this.  


Bureaucrats, cc_docs_admin, cc_staff
2,879

edits