Data management at Niagara: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 1: Line 1:


= Data Management = <!--T:110-->
= Data Management = <!--T:110-->
Understanding the various file systems, and how to use them properly, is critical to optimizing your workflow and being a good SciNet citizen.  This page describes the various Niagara file systems, and how to properly use them.
==Performance== <!--T:115-->
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.
<!--T:116-->
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.
== Purpose of each file system ==
== Purpose of each file system ==
Niagara accesses several different file systems.  Note that not all of these file systems are available to all users.
=== /home ===
=== /home ===
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.
Line 17: Line 28:
/archive is a nearline storage pool, if you want to temporarily offload semi-active material from any of the above file systems. In practice users will offload/recall material as part of their regular workflow, or when they hit their quotas on scratch or project. That material can remain on HPSS for a few months to a few years. Note that on niagara /archive is only available to groups with RAC allocation.
/archive is a nearline storage pool, if you want to temporarily offload semi-active material from any of the above file systems. In practice users will offload/recall material as part of their regular workflow, or when they hit their quotas on scratch or project. That material can remain on HPSS for a few months to a few years. Note that on niagara /archive is only available to groups with RAC allocation.


==Performance== <!--T:115-->
== Quotas and purging ==
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodesAs a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.''' For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use themThis table summarizes the various file systems.   


<!--T:116-->
{| class="wikitable"
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.
! location
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.
!colspan="2"| quota
!align="right"| block size
! expiration time
! backed up
! on login nodes
! on compute nodes
|-
| $HOME
|colspan="2"| 100 GB per user
|align="right"| 1 MB
|
| yes
| yes
| read-only
|-
|rowspan="6"| $SCRATCH
|colspan="2"| 25 TB per user provided group quota is not reached
|align="right" rowspan="6" | 16 MB
|rowspan="6"| 2 months
|rowspan="6"| no
|rowspan="6"| yes
|rowspan="6"| yes
|-
|align="right"|groups of up to 4 users
|align="right"|50TB for the group
|-
|align="right"|groups of up to 11 users
|align="right"|125TB for the group
|-
|align="right"|groups of up to 28 users
|align="right"|250TB for the group
|-
|align="right"|groups of up to 60 users
|align="right"|400TB for the group
|-
|align="right"|groups with over 60 users
|align="right"|500TB for the group
|-
| $PROJECT
|colspan="2"| by group allocation
|align="right"| 16 MB
|
| yes
| yes
| yes
|-
| $ARCHIVE
|colspan="2"| by group allocation
|align="right"|
|
| dual-copy
| no
| no
|-
| $BBUFFER
|colspan="2"| 10 TB per user
|align="right"| 1 MB
| very short
| no
| yes
| yes
|}


<!--T:41-->
<!--T:41-->
Line 32: Line 104:
<li>Backup means a recent snapshot, not an achive of all data that ever was.</li>
<li>Backup means a recent snapshot, not an achive of all data that ever was.</li>
<li><p><code>$BBUFFER</code> stands for [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer], a faster parallel storage tier for temporary data.</p></li></ul>
<li><p><code>$BBUFFER</code> stands for [https://docs.scinet.utoronto.ca/index.php/Burst_Buffer Burst Buffer], a faster parallel storage tier for temporary data.</p></li></ul>


== Moving data == <!--T:42-->
== Moving data == <!--T:42-->
cc_staff
290

edits

Navigation menu