Using nearline storage: Difference between revisions

m
heading levels
mNo edit summary
m (heading levels)
 
Line 5: Line 5:
Nearline is a tape-based filesystem intended for '''inactive data'''.  Datasets which you do not expect to access for months are good candidates to be stored in /nearline.  
Nearline is a tape-based filesystem intended for '''inactive data'''.  Datasets which you do not expect to access for months are good candidates to be stored in /nearline.  


= Restrictions and best practices= <!--T:33-->
== Restrictions and best practices == <!--T:33-->


Note that there is no need to compress the data that you will be copying to nearline; the tape archive system automatically performs the compression using specialized circuitry.
Note that there is no need to compress the data that you will be copying to nearline; the tape archive system automatically performs the compression using specialized circuitry.
   
   
==== Size of files ==== <!--T:34-->
=== Size of files === <!--T:34-->


<!--T:35-->
<!--T:35-->
Line 19: Line 19:
*<b>DO NOT SEND SMALL FILES TO NEARLINE, except for indexes (see <i>Creating an index</i> below).</b>
*<b>DO NOT SEND SMALL FILES TO NEARLINE, except for indexes (see <i>Creating an index</i> below).</b>


==== Using tar or dar ==== <!--T:36-->
=== Using tar or dar === <!--T:36-->


<!--T:37-->
<!--T:37-->
Line 31: Line 31:
If you are using <code>dar</code>, you can similarly use the <code>-s (--slice)</code> option.
If you are using <code>dar</code>, you can similarly use the <code>-s (--slice)</code> option.


===== Creating an index ===== <!--T:48-->
==== Creating an index ==== <!--T:48-->
When you bundle files, it becomes inconvenient to find individual files. To avoid having to restore an entire large collection from tape when you only need one or a few of the files from this collection, you should make an index of all archive files you create. Create the index as soon as you create the collection. For instance, you can save the output of tar with the <tt>verbose</tt> option when you create the archive, like this:
When you bundle files, it becomes inconvenient to find individual files. To avoid having to restore an entire large collection from tape when you only need one or a few of the files from this collection, you should make an index of all archive files you create. Create the index as soon as you create the collection. For instance, you can save the output of tar with the <tt>verbose</tt> option when you create the archive, like this:


Line 46: Line 46:
Index files are an exception to the rule about small files on nearline: it's okay to store them in /nearline.
Index files are an exception to the rule about small files on nearline: it's okay to store them in /nearline.


==== No access from compute nodes ==== <!--T:40-->
=== No access from compute nodes === <!--T:40-->


<!--T:41-->
<!--T:41-->
Because data retrieval from /nearline may take an uncertain amount of time (see ''How it works'' below), we do not permit reading from /nearline in a job context.  /nearline is not mounted on compute nodes.
Because data retrieval from /nearline may take an uncertain amount of time (see ''How it works'' below), we do not permit reading from /nearline in a job context.  /nearline is not mounted on compute nodes.


==== Use a data-transfer node if available ==== <!--T:42-->
=== Use a data-transfer node if available === <!--T:42-->


<!--T:32-->
<!--T:32-->
Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of on a login node whenever possible.
Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of on a login node whenever possible.


= Why /nearline? = <!--T:43-->
== Why /nearline? == <!--T:43-->


<!--T:44-->
<!--T:44-->
Line 67: Line 67:
Consequently we can offer much greater volumes of storage on /nearline than we can on /project.  Also, keeping inactive data ''off'' of /project reduces the load and improves its performance.
Consequently we can offer much greater volumes of storage on /nearline than we can on /project.  Also, keeping inactive data ''off'' of /project reduces the load and improves its performance.


= How it works = <!--T:46-->
== How it works == <!--T:46-->


<!--T:22-->
<!--T:22-->
Line 78: Line 78:
When a file has been moved entirely to tape (that is, when it is ''virtualized'') it will still appear in the directory listing.  If the virtual file is read, it will take some time for the tape to be retrieved from the library and copied back to disk. The process which is trying to read the file will block while this is happening.  This may take from less than a minute to over an hour, depending on the size of the file and the demand on the tape system.
When a file has been moved entirely to tape (that is, when it is ''virtualized'') it will still appear in the directory listing.  If the virtual file is read, it will take some time for the tape to be retrieved from the library and copied back to disk. The process which is trying to read the file will block while this is happening.  This may take from less than a minute to over an hour, depending on the size of the file and the demand on the tape system.


== Transferring data from Nearline == <!--T:53-->
=== Transferring data from Nearline === <!--T:53-->


<!--T:54-->
<!--T:54-->
Line 122: Line 122:
You can explicitly force a file to be recalled from tape without actually reading it with the command <code>lfs hsm_restore <FILE></code>.
You can explicitly force a file to be recalled from tape without actually reading it with the command <code>lfs hsm_restore <FILE></code>.


== Cluster-specific information == <!--T:6-->
=== Cluster-specific information === <!--T:6-->


<!--T:10-->
<!--T:10-->
cc_staff
82

edits