Translations:Handling large collections of files/1/en: Difference between revisions

Translations:Handling large collections of files/1/en (view source)

Revision as of 16:12, 22 July 2019

11 bytes removed , 5 years ago

Importing a new version from external source

FuzzyBot

Bots

38,757

edits

Revision as of 17:51, 18 July 2019 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)		Revision as of 16:12, 22 July 2019 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source) Newer edit →
Line 1:		Line 1:
	In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies\|filesystem quotas]] on Compute Canada clusters that limit the number of filesystem objects. So how can a user or group of users store these necessary ~~data sets~~ on the cluster? In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is an appropriate ~~one~~ for you.		In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies\|filesystem quotas]] on Compute Canada clusters that limit the number of filesystem objects. So how can a user or group of users store these necessary datasets on the cluster? In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is appropriate for you.