Handling large collections of files: Difference between revisions

Jump to navigation Jump to search
no edit summary
mNo edit summary
No edit summary
 
Line 4: Line 4:


<!--T:1-->
<!--T:1-->
In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies|filesystem quotas]] on Compute Canada clusters that limit the number of filesystem objects. Very large numbers of files, particularly small ones, create significant problems for the performance of these shared filesystems as well as the automated backup of the home and project spaces.   
In certain domains, notably [[AI and Machine Learning]], it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, a problem arises due to [[Storage_and_file_management#Filesystem_quotas_and_policies|filesystem quotas]] on our clusters that limit the number of filesystem objects. Very large numbers of files, particularly small ones, create significant problems for the performance of these shared filesystems as well as the automated backup of the home and project spaces.   
<p>
<p>
So how can a user or group of users store these necessary datasets on the cluster?  In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is appropriate for you.  
So how can a user or group of users store these necessary datasets on the cluster?  In this page we will present a variety of different solutions, each with its own pros and cons, so you may judge for yourself which is appropriate for you.  
Line 84: Line 84:


<!--T:23-->
<!--T:23-->
The SQLite executable is called <code>sqlite3</code>.  It is available via the <code>nixpkgs</code> [[Utiliser_des_modules/en|module]], which is loaded by default on Compute Canada systems.
The SQLite executable is called <code>sqlite3</code>.  It is available via the <code>nixpkgs</code> [[Utiliser_des_modules/en|module]], which is loaded by default on our systems.


===Parallel compression=== <!--T:17-->
===Parallel compression=== <!--T:17-->
rsnt_translations
56,430

edits

Navigation menu