Handling large collections of files: Difference between revisions

Handling large collections of files (view source)

Revision as of 16:36, 21 February 2020

1 byte added , 4 years ago

m

no edit summary

Rdickson

Bureaucrats, cc_docs_admin, cc_staff

2,879

edits

@@ Line 30: / Line 30: @@
 * [[Béluga/en | Béluga]] offers roughly 370GB of local disk for the CPU nodes, the GPU nodes have a 1.6TB NVMe disk (to help with the AI image datasets with their millions of small files).
 * [[Niagara]] does not have local storage on the compute nodes
-* for other clusters you can assume the available disk size to be at least 190GB
+* For other clusters you can assume the available disk size to be at least 190GB
-You can access this local disk inside of a job using the environment variable <tt>$SLURM_TMPDIR</tt>. One approach therefore would be to keep your dataset archived as a single <tt>tar</tt> file in the project space and then copy it to the local disk at the beginning of your job, extract it and use the dataset during the job. If any changes were made, at the job's end you could again archive the contents to a <tt>tar</tt> file and copy it back to the project space. Here is an example of a submission scrip that allocates an entire node
+You can access this local disk inside of a job using the environment variable <tt>$SLURM_TMPDIR</tt>. One approach therefore would be to keep your dataset archived as a single <tt>tar</tt> file in the project space and then copy it to the local disk at the beginning of your job, extract it and use the dataset during the job. If any changes were made, at the job's end you could again archive the contents to a <tt>tar</tt> file and copy it back to the project space. Here is an example of a submission script that allocates an entire node
 {{File
 |name=job_script.sh