Using nearline storage

From Alliance Doc
Jump to navigation Jump to search


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




Nearline is a filesystem virtualized onto tape

Nearline storage is like Project, except that files can be "virtualized" by moving them to tape. This is a way to manage less-used files. On tape they do not consume your disk quota, but they can still be accessed, albeit more slowly.

This is useful because the capacity of our tape libraries is both large and expandable. When a file has been moved to tape (that is, "virtualized"), it will still appear in the directory listing. If the virtual file is read, the reading process will block for some period, probably a few minutes, while the file contents are read from tape to disk. Then IO to the file will behave like any other disk-based file.

Expected use

Because of the delay in reading from tape, Nearline is not intended to be used by jobs, where the delay would waste allocated time. It is only accessible from login and DTN nodes.

How to use

To use Nearline, just put files into your ~/nearline/PROJECT directory. After a period of time (currently 24 hours), they'll be copied onto tape. If the file remains unchanged for another period (also 24h), the copy on disk will be removed, making the file virtualized on tape.

Like most HPC storage, it's bad practice to have lots of small files. In fact, files smaller than a certain threshold size may not be moved to tape at all. So if you have large collections of small files, it's wise to bundle them using a tool like tar.

If you remove a file in ~/nearline, the tape copy will be retained for up to 60 days. To restore such a file, contact technical support with the full path for the file(s) and desired version (by date), just as you would for backup restoration.