rsnt_translations
56,420
edits
(major revision following Sep 28 S2S seminar) |
(Marked this version for translation) |
||
Line 5: | Line 5: | ||
Nearline is a tape-based file system intended for *inactive data*. Data sets which you do not expect to access for months are good candidates to be stored in nearline. | Nearline is a tape-based file system intended for *inactive data*. Data sets which you do not expect to access for months are good candidates to be stored in nearline. | ||
= Best practices, and restrictions = | = Best practices, and restrictions = <!--T:33--> | ||
==== Size of files ==== | ==== Size of files ==== <!--T:34--> | ||
<!--T:35--> | |||
Retrieving small files from tape is inefficient, while extremely large files pose other problems. Please observe these guidelines about the size of files to store in nearline: | Retrieving small files from tape is inefficient, while extremely large files pose other problems. Please observe these guidelines about the size of files to store in nearline: | ||
Line 15: | Line 16: | ||
*Files larger than 300GB should be split in chunks of 100GB using the [[A_tutorial_on_'tar'#split|split]] command or a similar tool. | *Files larger than 300GB should be split in chunks of 100GB using the [[A_tutorial_on_'tar'#split|split]] command or a similar tool. | ||
==== Using tar or dar ==== | ==== Using tar or dar ==== <!--T:36--> | ||
<!--T:37--> | |||
Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file directly on nearline. There is no advantage to creating the archive on a different filesystem and then copying it to nearline once complete. | Use [[A tutorial on 'tar'|tar]] or [[dar]] to create an archive file directly on nearline. There is no advantage to creating the archive on a different filesystem and then copying it to nearline once complete. | ||
<!--T:38--> | |||
If you have hundreds of gigabytes of data, the <code>tar</code> options <code>-M (--muti-volume)</code> and <code>-L (--tape-length)</code> can be used to produce archive files of suitable size. | If you have hundreds of gigabytes of data, the <code>tar</code> options <code>-M (--muti-volume)</code> and <code>-L (--tape-length)</code> can be used to produce archive files of suitable size. | ||
<!--T:39--> | |||
If you are using <code>dar</code>, you can similarly use the <code>-s (--slice)</code> option. | If you are using <code>dar</code>, you can similarly use the <code>-s (--slice)</code> option. | ||
==== No access from compute nodes ==== | ==== No access from compute nodes ==== <!--T:40--> | ||
<!--T:41--> | |||
Because data retrieval from nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from nearline in a job context. Nearline is not mounted on compute nodes. | Because data retrieval from nearline may take an uncertain amount of time (see "How it works" below), we do not permit reading from nearline in a job context. Nearline is not mounted on compute nodes. | ||
==== Use a data-transfer node if available ==== | ==== Use a data-transfer node if available ==== <!--T:42--> | ||
<!--T:32--> | <!--T:32--> | ||
Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using. | Creating a tar or dar file for a large volume of data can be resource-intensive. Please do this on a data-transfer node (DTN) instead of a login node if login to a DTN is supported at the cluster you are using. | ||
= Why nearline? = | = Why nearline? = <!--T:43--> | ||
<!--T:44--> | |||
Tape as a storage medium has these advantages over disk and solid-state ("SSD") media. | Tape as a storage medium has these advantages over disk and solid-state ("SSD") media. | ||
# Cost per unit of data stored is lower. | # Cost per unit of data stored is lower. | ||
Line 39: | Line 45: | ||
# Energy consumption per unit of data stored is effectively zero. | # Energy consumption per unit of data stored is effectively zero. | ||
<!--T:45--> | |||
Consequently we can offer much greater volumes of storage on nearline than we can on project. Also, keeping inactive data ''off'' of project reduces the load and improves its performance. | Consequently we can offer much greater volumes of storage on nearline than we can on project. Also, keeping inactive data ''off'' of project reduces the load and improves its performance. | ||
= How it works = | = How it works = <!--T:46--> | ||
<!--T:22--> | <!--T:22--> | ||
Line 55: | Line 62: | ||
You can determine whether or not a given file has been moved to tape or is still on disk using the `lfs hsm_state` command. The "hsm" stands for "hierarchical storage manager". | You can determine whether or not a given file has been moved to tape or is still on disk using the `lfs hsm_state` command. The "hsm" stands for "hierarchical storage manager". | ||
<!--T:47--> | |||
<source lang="bash"> | <source lang="bash"> | ||
# Here, <FILE> has not been copied to tape. | # Here, <FILE> has not been copied to tape. |