Using nearline storage: Difference between revisions

Jump to navigation Jump to search
no edit summary
m (fix bold (from markdown))
No edit summary
Line 10: Line 10:


<!--T:35-->
<!--T:35-->
Retrieving small files from tape is inefficient, while extremely large files pose other problems. Please observe these guidelines about the size of files to store in /nearline:
Retrieving small files from tape is inefficient, while extremely large files pose other problems. Please observe these guidelines when storing files in /nearline:


<!--T:9-->
<!--T:9-->
*Files smaller than ~10GB should be combined into archive files (''tarballs'') using [[A tutorial on 'tar'|tar]] or a [[Archiving and compressing files|similar tool]].
*Files smaller than ~10GB should be combined into archive files (''tarballs'') using [[A tutorial on 'tar'|tar]] or a [[Archiving and compressing files|similar tool]].
*Files larger than 4TB should be split in chunks of 1TB using the [[A_tutorial_on_'tar'#split|split]] command or a similar tool.
*Files larger than 4TB should be split in chunks of 1TB using the [[A_tutorial_on_'tar'#split|split]] command or a similar tool.
*'''DO NOT SEND SMALL FILES TO NEARLINE'''
*'''DO NOT SEND SMALL FILES TO NEARLINE, except for indexes (see ''Creating an index'' below).'''


==== Using tar or dar ==== <!--T:36-->
==== Using tar or dar ==== <!--T:36-->
Line 21: Line 21:
<!--T:37-->
<!--T:37-->
Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file.
Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file.
* Keep the source files on their original filesystem. Do NOT copy the source files to nearline before creating the archive!
Keep the source files in their original filesystem. Do NOT copy the source files to /nearline before creating the archive.
* Write the archive file directly to nearline.  While writing it to a different filesystem and then moving it to nearline is not as bad as moving the source files to nearline and then writing the archive, it is unnecessary.


<!--T:38-->
<!--T:38-->
Line 31: Line 30:


<!--T:48-->
<!--T:48-->
When you bundle files, such as with tar, it becomes inconvenient to find individual files. To avoid having to restore an entire large collection from tape, when you only need a one or a few of the files in it, you should save an index of all archive files you create. Construct an index as soon as you create the collection. For instance, you can save the output of tar with the "verbose" option when you create the archive, like this:
===== Creating an index =====
When you bundle files, it becomes inconvenient to find individual files. To avoid having to restore an entire large collection from tape when you only need one or a few of the files from this collection, you should make an index of all archive files you create. Create the index as soon as you create the collection. For instance, you can save the output of tar with the <tt>verbose</tt> option when you create the archive, like this:


  <!--T:49-->
  <!--T:49-->
Line 43: Line 43:


<!--T:52-->
<!--T:52-->
Index files are an exception to the rule about small files on nearline: It's okay to store your nearline index files on nearline.
Index files are an exception to the rule about small files on nearline: it's okay to store them in /nearline.


==== No access from compute nodes ==== <!--T:40-->
==== No access from compute nodes ==== <!--T:40-->


<!--T:41-->
<!--T:41-->
Because data retrieval from /nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from /nearline in a job context.  /nearline is not mounted on compute nodes.
Because data retrieval from /nearline may take an uncertain amount of time (see ''How it works'' below), we do not permit reading from /nearline in a job context.  /nearline is not mounted on compute nodes.


==== Use a data-transfer node if available ==== <!--T:42-->
==== Use a data-transfer node if available ==== <!--T:42-->


<!--T:32-->
<!--T:32-->
Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using.
Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of on a login node whenever possible.


= Why /nearline? = <!--T:43-->
= Why /nearline? = <!--T:43-->
rsnt_translations
56,420

edits

Navigation menu