Handling large collections of files: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 26: Line 26:


==Local disk== <!--T:6-->
==Local disk== <!--T:6-->
Note that one option is the use of the attached local disk for the compute node, which offers roughly 190GB of disk space. Local disk is shared by all running jobs on that node without being allocated by the scheduler. In general, it will have a performance that is considerably better than the project or scratch filesystems. You can access this local disk inside of a job using the environment variable <tt>$SLURM_TMPDIR</tt>. One approach therefore would be to keep your dataset archived as a single <tt>tar</tt> file in the project space and then copy it to the local disk at the beginning of your job, extract it and use the dataset during the job. If any changes were made, at the job's end you could again archive the contents to a <tt>tar</tt> file and copy it back to the project space.  
Note that one option is the use of the attached local disk for the compute node, which offers roughly 190GB of disk space (the actual amount might be more, but it varies from one cluster to another and within a given cluster, too). Local disk is shared by all running jobs on that node without being allocated by the scheduler. In general, it will have a performance that is considerably better than the project or scratch filesystems. You can access this local disk inside of a job using the environment variable <tt>$SLURM_TMPDIR</tt>. One approach therefore would be to keep your dataset archived as a single <tt>tar</tt> file in the project space and then copy it to the local disk at the beginning of your job, extract it and use the dataset during the job. If any changes were made, at the job's end you could again archive the contents to a <tt>tar</tt> file and copy it back to the project space.  
{{File
{{File
|name=job_script.sh
|name=job_script.sh
cc_staff
318

edits

Navigation menu