Handling large collections of files

From Alliance Doc
Revision as of 15:12, 2 July 2019 by Stubbsda (talk | contribs)
Jump to navigation Jump to search

In certain domains it is common to have to manage very large collections - meaning hundreds of thousands or more - of files, which individually are often though not always fairly small, e.g. less than a few hundred kilobytes. In these cases, a problem naturally arises from storing such data on Compute Canada clusters due to the filesystem quotas that limit the number of distinct filesystem objects to 500K for the project space (by default) and 1M for the scratch space in most instances. So how can a user or group of users store these necessary data sets on the cluster? In this page we will present a variety of different solutions and workarounds, each of which has its own pros and cons, and allow you as a reader to judge for yourself which is the optimal approach for you.

DAR

HDF5

SQLite

SquashFS