cc_staff
7
edits
m (wrap some column headers) |
(Add a section named CUDA Compute Capability) |
||
Line 190: | Line 190: | ||
With this method, users can run multiple tasks in one submission. The <code>-j4</code> parameter means GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as each one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time. | With this method, users can run multiple tasks in one submission. The <code>-j4</code> parameter means GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as each one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time. | ||
< | == CUDA Compute Capability == | ||
When you are compiling CUDA code on clusters it’s important to know which is the Compute Capability of the GPU that you are targeting. If you get the following error during the compile time: | |||
<pre> | |||
nvcc fatal : Unsupported gpu architecture 'compute_XX' | |||
</pre> | |||
or this error during running your CUDA code on a compute node with GPU: | |||
<pre> | |||
no kernel image is available for execution on the device (209) | |||
</pre> | |||
you can fix it by adding the correct FLAG to “nvcc” call: | |||
<pre> | |||
-gencode arch=compute_XX,code=[sm_XX,compute_XX] | |||
</pre> | |||
or if you are using CMake to build your project, by providing the following flag: | |||
<pre> | |||
cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX | |||
</pre> | |||
where “XX” is the Compute Capability of the Nvidia GPU board that you are going to use. Now you need to know the correct value to replace “XX“, you can find it under Compute Capability column on the above table. | |||
For example, if you are running your code on a Narval A100 node, you find that its Compute Capability is 80, so the correct FLAG to use in the compiler is | |||
<pre> | |||
-gencode arch=compute_80,code=[sm_80,compute_80] | |||
</pre> | |||
or the following command to configure CMake: | |||
<pre> | |||
cmake .. -DCMAKE_CUDA_ARCHITECTURES=80 | |||
</pre> | |||
[[Category:SLURM]] | [[Category:SLURM]] | ||
</translate> | </translate> |