Using GPUs with Slurm: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 199: Line 199:


<!--T:13-->
<!--T:13-->
If you need to run four single-GPU programs or two 2-GPU programs for longer than 24 hours, [[GNU Parallel]] is recommended. A simple example is given below:
If you need to run four single-GPU programs or two 2-GPU programs for longer than 24 hours, [[GNU Parallel]] is recommended. A simple example is:
<pre>
<pre>
cat params.input | parallel -j4 'CUDA_VISIBLE_DEVICES=$(({%} - 1)) python {} &> {#}.out'
cat params.input | parallel -j4 'CUDA_VISIBLE_DEVICES=$(({%} - 1)) python {} &> {#}.out'
</pre>
</pre>
In this example the GPU ID is calculated by subtracting 1 from the slot ID {%}. {#} is the job ID, starting from 1.
In this example, the GPU ID is calculated by subtracting 1 from the slot ID {%} and {#} is the job ID, starting from 1.


<!--T:14-->
<!--T:14-->
Line 214: Line 214:
...
...
</pre>
</pre>
With this method, you can run multiple tasks in one submission. The <code>-j4</code> parameter means that GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as each one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time.
With this method, you can run multiple tasks in one submission. The <code>-j4</code> parameter means that GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time.


== Profiling GPU tasks == <!--T:65-->
== Profiling GPU tasks == <!--T:65-->
rsnt_translations
56,430

edits