Using nearline storage: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 3: Line 3:


<!--T:30-->
<!--T:30-->
Nearline is a tape-based file system intended for *inactive data*.  Datasets which you do not expect to access for months are good candidates to be stored in nearline.  
Nearline is a tape-based filesystem intended for *inactive data*.  Datasets which you do not expect to access for months are good candidates to be stored in /nearline.  


= Best practices, and restrictions = <!--T:33-->
= Restrictions and best practices= <!--T:33-->


==== Size of files ==== <!--T:34-->
==== Size of files ==== <!--T:34-->


<!--T:35-->
<!--T:35-->
Retrieving small files from tape is inefficient, while extremely large files pose other problems.  Please observe these guidelines about the size of files to store in nearline:
Retrieving small files from tape is inefficient, while extremely large files pose other problems.  Please observe these guidelines about the size of files to store in /nearline:


<!--T:9-->
<!--T:9-->
Line 19: Line 19:


<!--T:37-->
<!--T:37-->
Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file directly on nearline.  There is no advantage to creating the archive on a different filesystem and then copying it to nearline once complete.
Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file directly on /nearline.  There is no advantage to creating the archive on a different filesystem and then copying it to /nearline once complete.


<!--T:38-->
<!--T:38-->
Line 30: Line 30:


<!--T:41-->
<!--T:41-->
Because data retrieval from nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from nearline in a job context.  Nearline is not mounted on compute nodes.
Because data retrieval from /nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from /nearline in a job context.  /nearline is not mounted on compute nodes.


==== Use a data-transfer node if available ==== <!--T:42-->
==== Use a data-transfer node if available ==== <!--T:42-->
Line 37: Line 37:
Creating a tar or dar file for a large volume of data can be resource-intensive.  Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using.
Creating a tar or dar file for a large volume of data can be resource-intensive.  Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using.


= Why nearline? = <!--T:43-->
= Why /nearline? = <!--T:43-->


<!--T:44-->
<!--T:44-->
Line 46: Line 46:


<!--T:45-->
<!--T:45-->
Consequently we can offer much greater volumes of storage on nearline than we can on project.  Also, keeping inactive data ''off'' of project reduces the load and improves its performance.
Consequently we can offer much greater volumes of storage on /nearline than we can on /project.  Also, keeping inactive data ''off'' of /project reduces the load and improves its performance.


= How it works = <!--T:46-->
= How it works = <!--T:46-->


<!--T:22-->
<!--T:22-->
# When a file is first copied to (or created on) nearline, the file exists only on disk, not tape.
# When a file is first copied to (or created on) /nearline, the file exists only on disk, not tape.
# After a period (on the order of a day), and if the file meets certain criteria, the system will copy the file to tape. At this stage, the file will be on both disk and tape.
# After a period (on the order of a day), and if the file meets certain criteria, the system will copy the file to tape. At this stage, the file will be on both disk and tape.
# After a further period the disk copy may be deleted, and the file will only be on tape.
# After a further period the disk copy may be deleted, and the file will only be on tape.
Line 60: Line 60:


<!--T:24-->
<!--T:24-->
You can determine whether or not a given file has been moved to tape or is still on disk using the `lfs hsm_state` command.  The "hsm" stands for "hierarchical storage manager".
You can determine whether or not a given file has been moved to tape or is still on disk using the <tt>lfs hsm_state</tt> command where "hsm" stands for "hierarchical storage manager".


<!--T:47-->
<!--T:47-->
Line 83: Line 83:


<!--T:29-->
<!--T:29-->
Note that as of October 2020, the output of the command <code>diskusage_report</code>, also known as <code>quota</code>, does not report on nearline space consumption.
Note that as of October 2020, the output of the command <code>diskusage_report</code>, also known as <code>quota</code>, does not report on /nearline space consumption.


== Site-specific information == <!--T:6-->
== Site-specific information == <!--T:6-->
Line 90: Line 90:
<tabs>
<tabs>
<tab name="Graham">
<tab name="Graham">
Nearline is only accessible as a directory on login nodes and on DTNs (''Data Transfer Nodes'').
/nearline is only accessible as a directory on login nodes and on DTNs (''Data Transfer Nodes'').


<!--T:11-->
<!--T:11-->
To use nearline, just put files into your <tt>~/nearline/PROJECT</tt> directory. After a period of time (24 hours as of February 2019), they will be copied onto tape. If the file remains unchanged for another period (24 hours as of February 2019), the copy on disk will be removed, making the file virtualized on tape.  
To use /nearline, just put files into your <tt>~/nearline/PROJECT</tt> directory. After a period of time (24 hours as of February 2019), they will be copied onto tape. If the file remains unchanged for another period (24 hours as of February 2019), the copy on disk will be removed, making the file virtualized on tape.  


<!--T:8-->
<!--T:8-->
If you accidentally (or deliberately) delete a file from <tt>~/nearline</tt>, the tape copy will be retained for up to 60 days. To restore such a file contact [[technical support]] with the full path for the file(s) and desired version (by date), just as you would for restoring a [[Storage and file management#Filesystem quotas and policies|backup]]. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your nearline space. For example, you can run the command <tt>ls -R > ~/nearline_contents.txt</tt> from the <tt>~/nearline/PROJECT</tt> directory so that you have a copy of the location of all the files.
If you accidentally (or deliberately) delete a file from <tt>~/nearline</tt>, the tape copy will be retained for up to 60 days. To restore such a file contact [[technical support]] with the full path for the file(s) and desired version (by date), just as you would for restoring a [[Storage and file management#Filesystem quotas and policies|backup]]. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your /nearline space. For example, you can run the command <tt>ls -R > ~/nearline_contents.txt</tt> from the <tt>~/nearline/PROJECT</tt> directory so that you have a copy of the location of all the files.
</tab>
</tab>


<!--T:16-->
<!--T:16-->
<tab name="Cedar">
<tab name="Cedar">
Nearline service similar to that on Graham.
/nearline service similar to that on Graham.
</tab>
</tab>


<!--T:17-->
<!--T:17-->
<tab name="Niagara">
<tab name="Niagara">
HPSS is the nearline service on Niagara.<br/>
HPSS is the /nearline service on Niagara.<br/>
There are three methods to access the service:
There are three methods to access the service:


<!--T:12-->
<!--T:12-->
1. By submitting HPSS-specific commands <tt>htar</tt> or <tt>hsi</tt> to the Slurm scheduler as a job in one of the archive partitions; see [https://docs.scinet.utoronto.ca/index.php/HPSS the HPSS documentation] for detailed examples. Using job scripts offers the benefit of automating nearline transfers and is the best method if you use HPSS regularly. Your HPSS files can be found in the $ARCHIVE directory, which is like $PROJECT but with ''/project'' replaced by ''/archive''.  
1. By submitting HPSS-specific commands <tt>htar</tt> or <tt>hsi</tt> to the Slurm scheduler as a job in one of the archive partitions; see [https://docs.scinet.utoronto.ca/index.php/HPSS the HPSS documentation] for detailed examples. Using job scripts offers the benefit of automating /nearline transfers and is the best method if you use HPSS regularly. Your HPSS files can be found in the $ARCHIVE directory, which is like $PROJECT but with ''/project'' replaced by ''/archive''.  


<!--T:13-->
<!--T:13-->
Line 123: Line 123:
<!--T:20-->
<!--T:20-->
<tab name="Béluga">
<tab name="Béluga">
Nearline service similar to that on Graham.
/nearline service similar to that on Graham.
</tab>
</tab>
</tabs>
</tabs>


</translate>
</translate>
rsnt_translations
56,420

edits

Navigation menu