Handling large collections of files: Difference between revisions

Jump to navigation Jump to search
remove Draft tag
(moved squashfs and ratarmount sections to Talk page)
(remove Draft tag)
Line 1: Line 1:
{{Draft}}


In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more.  The individual files may be fairly small, e.g. less than a few hundred kilobytes.  In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies|filesystem quotas]] on Compute Canada clusters that limit the number of filesystem objects.  So how can a user or group of users store these necessary data sets on the cluster?  In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is an appropriate one for you.  
In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more.  The individual files may be fairly small, e.g. less than a few hundred kilobytes.  In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies|filesystem quotas]] on Compute Canada clusters that limit the number of filesystem objects.  So how can a user or group of users store these necessary data sets on the cluster?  In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is an appropriate one for you.  
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu