Using nearline storage: Difference between revisions

Using nearline storage (view source)

Revision as of 15:51, 30 September 2020

180 bytes added , 4 years ago

Marked this version for translation

Diane27

rsnt_translations

56,420

edits

@@ Line 5: / Line 5: @@
 Nearline is a tape-based file system intended for *inactive data*.  Data sets which you do not expect to access for months are good candidates to be stored in nearline.
-= Best practices, and restrictions =
+= Best practices, and restrictions = <!--T:33-->
-==== Size of files ====
+==== Size of files ==== <!--T:34-->
+<!--T:35-->
 Retrieving small files from tape is inefficient, while extremely large files pose other problems.  Please observe these guidelines about the size of files to store in nearline:
@@ Line 15: / Line 16: @@
 *Files larger than 300GB should be split in chunks of 100GB using the [[A_tutorial_on_'tar'#split|split]] command or a similar tool.
-==== Using tar or dar ====
+==== Using tar or dar ==== <!--T:36-->
+<!--T:37-->
 Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file directly on nearline.  There is no advantage to creating the archive on a different filesystem and then copying it to nearline once complete.
+<!--T:38-->
 If you have hundreds of gigabytes of data, the <code>tar</code> options <code>-M (--muti-volume)</code> and <code>-L (--tape-length)</code> can be used to produce archive files of suitable size.
+<!--T:39-->
 If you are using <code>dar</code>, you can similarly use the <code>-s (--slice)</code> option.
-==== No access from compute nodes ====
+==== No access from compute nodes ==== <!--T:40-->
+<!--T:41-->
 Because data retrieval from nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from nearline in a job context.  Nearline is not mounted on compute nodes.
-==== Use a data-transfer node if available ====
+==== Use a data-transfer node if available ==== <!--T:42-->
 <!--T:32-->
 Creating a tar or dar file for a large volume of data can be resource-intensive.  Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using.
-= Why nearline? =
+= Why nearline? = <!--T:43-->
+<!--T:44-->
 Tape as a storage medium has these advantages over disk and solid-state ("SSD") media.
 # Cost per unit of data stored is lower.
@@ Line 39: / Line 45: @@
 # Energy consumption per unit of data stored is effectively zero.
+<!--T:45-->
 Consequently we can offer much greater volumes of storage on nearline than we can on project.  Also, keeping inactive data ''off'' of project reduces the load and improves its performance.
-= How it works =
+= How it works = <!--T:46-->
 <!--T:22-->
@@ Line 55: / Line 62: @@
 You can determine whether or not a given file has been moved to tape or is still on disk using the `lfs hsm_state` command.  The "hsm" stands for "hierarchical storage manager".
+<!--T:47-->
 <source lang="bash">
 #  Here, <FILE> has not been copied to tape.