Using nearline storage: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:


==Nearline is a filesystem virtualized onto tape== <!--T:1-->
==Nearline is a filesystem virtualized onto tape== <!--T:1-->
Nearline storage is  a disk-tape hybrid systems with a layout like [[Project layout|Project]], except that the system may "virtualize" files by moving them to tape based on criteria like age and size, and then back again upon read or recall operations. This is a way to manage less-used files. On tape they do not consume your disk quota, but they can still be accessed, albeit more slowly.
Nearline storage is  a disk-tape hybrid filesystem with a layout like [[Project layout|Project]], except that it may virtualize files by moving them to tape-based storage on criteria like age and size, and then back again upon read or recall operations. This is a way to manage less used files. On tape, the files do not consume your disk quota, but they can still be accessed, albeit more slowly.


<!--T:2-->
<!--T:2-->
This is useful because the capacity of our tape libraries is both large and expandable.  When a file has been moved to tape (that is, "virtualized"), it will still appear in the directory listing.  If the virtual file is read, the reading process will block for some time, probably a few minutes, while the file contents are read from tape to disk.   
This is useful because the capacity of our tape libraries is both large and expandable.  When a file has been moved to tape (or ''virtualized''), it still appears in the directory listing.  If the virtual file is read, the reading process will block for some time, probably a few minutes, while the file contents are read from tape to disk.   


== Expected use and status== <!--T:3-->
== Expected use == <!--T:3-->
Because of the delay in reading from tape, Nearline is not intended to be used by jobs, where the delay would waste allocated time.  It is only accessible as a directory on certain nodes of the clusters, and in particular, not on the compute nodes  
Because of the delay in reading from tape, Nearline is not intended to be used by jobs where allocated time would be wasted.  It is only accessible as a directory on certain nodes of the clusters, and in particular, not on the compute nodes.


<!--T:9-->
<!--T:9-->
Nearline is intended for use with relatively large files - do not use it for large numbers of small files.  In fact, files smaller than a certain threshold size may not be moved to tape at all.  Files smaller than ~200MB should be combined into archive files ("tarballs") using [[Archiving and compressing files|tar]] or a similar tool.
Nearline is intended for use with relatively large files and should not be used for large numbers of small files.  In fact, files smaller than a certain threshold size may not be moved to tape at all.  Files smaller than ~200MB should be combined into archive files (''tarballs'') using [[Archiving and compressing files|tar]] or a similar tool.


<!--T:5-->
<!--T:5-->
Line 28: Line 28:


<!--T:8-->
<!--T:8-->
If you remove a file in <tt>~/nearline</tt>, the tape copy will be retained for up to 60 days. To restore such a file, contact [[technical support]] with the full path for the file(s) and desired version (by date), just as you would for [[Storage and file management#Filesystem quotas and policies|backup]] restoration. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your Nearline space. For example, you can run the command <tt>ls -R > ~/nearline_contents.txt</tt> from the <tt>~/nearline/PROJECT</tt> directory so that you have a copy of the location of all the files in your Nearline space.
If you remove a file from <tt>~/nearline</tt>, the tape copy will be retained for up to 60 days. To restore such a file, contact [[technical support]] with the full path for the file(s) and desired version (by date), just as you would for [[Storage and file management#Filesystem quotas and policies|backup]] restoration. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your Nearline space. For example, you can run the command <tt>ls -R > ~/nearline_contents.txt</tt> from the <tt>~/nearline/PROJECT</tt> directory so that you have a copy of the location of all the files in your Nearline space.
</tab>
</tab>


<!--T:16-->
<!--T:16-->
<tab name="Cedar">
<tab name="Cedar">
Nearline service is not yet available at Cedar.  Coming soon.
Nearline service will be available soon.
</tab>
</tab>


<!--T:17-->
<!--T:17-->
<tab name="Niagara">
<tab name="Niagara">
There are three ways to access Nearline on Niagara:
There are three methods to access Nearline on Niagara:


<!--T:12-->
<!--T:12-->
1. By submitting hpss-specific commands htar or hsi as an 'archive' job to SLURM; see [https://docs.scinet.utoronto.ca/index.php/HPSS the HPSS documentation] for detailed examples. Using job scripts offer the benefit of automating Nearline transfers, and is the best method if you use HPSS regularly.
1. By submitting HPSS-specific commands <tt>htar</tt> or <tt>hsi</tt> as an 'archive' job to the Slurm scheduler; see [https://docs.scinet.utoronto.ca/index.php/HPSS the HPSS documentation] for detailed examples. Using job scripts offer the benefit of automating Nearline transfers and is the best method if you use HPSS regularly.


<!--T:13-->
<!--T:13-->
2. For small data management of files in HPSS, you can use the VFS ("Virtual File System") node, which is accessed with the command: <tt>salloc --time=1:00:00 -pvfsshort</tt>
2. For small data management of files in HPSS, you can use the VFS (''Virtual File System'') node, which is accessed with the command: <tt>salloc --time=1:00:00 -pvfsshort</tt>.


<!--T:14-->
<!--T:14-->
3. You can also use [[Globus]] for transfers to and from HPSS using the endpoint <b>computecanada#hpss</b>.  This is useful for occasional usage and for transfers from other sites.
3. By using [[Globus]] for transfers to and from HPSS using the endpoint <b>computecanada#hpss</b>.  This is useful for occasional usage and for transfers from other sites.


<!--T:15-->
<!--T:15-->

Revision as of 20:45, 8 February 2019

Other languages:

Nearline is a filesystem virtualized onto tape

Nearline storage is a disk-tape hybrid filesystem with a layout like Project, except that it may virtualize files by moving them to tape-based storage on criteria like age and size, and then back again upon read or recall operations. This is a way to manage less used files. On tape, the files do not consume your disk quota, but they can still be accessed, albeit more slowly.

This is useful because the capacity of our tape libraries is both large and expandable. When a file has been moved to tape (or virtualized), it still appears in the directory listing. If the virtual file is read, the reading process will block for some time, probably a few minutes, while the file contents are read from tape to disk.

Expected use

Because of the delay in reading from tape, Nearline is not intended to be used by jobs where allocated time would be wasted. It is only accessible as a directory on certain nodes of the clusters, and in particular, not on the compute nodes.

Nearline is intended for use with relatively large files and should not be used for large numbers of small files. In fact, files smaller than a certain threshold size may not be moved to tape at all. Files smaller than ~200MB should be combined into archive files (tarballs) using tar or a similar tool.

HPSS is the name of the Nearline service at Niagara. At Graham the service is called "nearline". Work is underway at Cedar and Béluga, which will have implementations that resemble Graham's.

How to use

Nearline is only accessible as a directory on the login nodes and DTNs ("Data Transfer Nodes"),

To use Nearline, just put files into your ~/nearline/PROJECT directory. After a period of time (currently 24 hours), they'll be copied onto tape. If the file remains unchanged for another period (also 24h), the copy on disk will be removed, making the file virtualized on tape.

If you remove a file from ~/nearline, the tape copy will be retained for up to 60 days. To restore such a file, contact technical support with the full path for the file(s) and desired version (by date), just as you would for backup restoration. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your Nearline space. For example, you can run the command ls -R > ~/nearline_contents.txt from the ~/nearline/PROJECT directory so that you have a copy of the location of all the files in your Nearline space.

Nearline service will be available soon.

There are three methods to access Nearline on Niagara:

1. By submitting HPSS-specific commands htar or hsi as an 'archive' job to the Slurm scheduler; see the HPSS documentation for detailed examples. Using job scripts offer the benefit of automating Nearline transfers and is the best method if you use HPSS regularly.

2. For small data management of files in HPSS, you can use the VFS (Virtual File System) node, which is accessed with the command: salloc --time=1:00:00 -pvfsshort.

3. By using Globus for transfers to and from HPSS using the endpoint computecanada#hpss. This is useful for occasional usage and for transfers from other sites.

In usage modes 1 and 2, your HPSS files can be found in the $ARCHIVE directory, which is like '$PROJECT' but with '/project' replaced by '/archive'.