Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
(Marked this version for translation) |
|||
Line 190: | Line 190: | ||
With this method, users can run multiple tasks in one submission. The <code>-j4</code> parameter means GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as each one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time. | With this method, users can run multiple tasks in one submission. The <code>-j4</code> parameter means GNU Parallel can run a maximum of four concurrent tasks, launching another as soon as each one ends. CUDA_VISIBLE_DEVICES is used to ensure that two tasks do not try to use the same GPU at the same time. | ||
== CUDA Compute Capability == | == CUDA Compute Capability == <!--T:41--> | ||
When you are compiling CUDA code on clusters it’s important to know what is the Compute Capability of the GPU that you are targeting. If you get the following error during the compile time: | When you are compiling CUDA code on clusters it’s important to know what is the Compute Capability of the GPU that you are targeting. If you get the following error during the compile time: | ||
<!--T:42--> | |||
<pre> | <pre> | ||
nvcc fatal : Unsupported gpu architecture 'compute_XX' | nvcc fatal : Unsupported gpu architecture 'compute_XX' | ||
</pre> | </pre> | ||
<!--T:43--> | |||
or this error during running your CUDA code on a compute node with GPU: | or this error during running your CUDA code on a compute node with GPU: | ||
<!--T:44--> | |||
<pre> | <pre> | ||
no kernel image is available for execution on the device (209) | no kernel image is available for execution on the device (209) | ||
</pre> | </pre> | ||
<!--T:45--> | |||
you can fix it by adding the correct FLAG to “nvcc” call: | you can fix it by adding the correct FLAG to “nvcc” call: | ||
<!--T:46--> | |||
<pre> | <pre> | ||
-gencode arch=compute_XX,code=[sm_XX,compute_XX] | -gencode arch=compute_XX,code=[sm_XX,compute_XX] | ||
</pre> | </pre> | ||
<!--T:47--> | |||
or if you are using CMake to build your project, by providing the following flag: | or if you are using CMake to build your project, by providing the following flag: | ||
<!--T:48--> | |||
<pre> | <pre> | ||
cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX | cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX | ||
</pre> | </pre> | ||
<!--T:49--> | |||
where “XX” is the Compute Capability of the Nvidia GPU board that you are going to use. Now you need to know the correct value to replace “XX“, you can find it under Compute Capability column on the above table. | where “XX” is the Compute Capability of the Nvidia GPU board that you are going to use. Now you need to know the correct value to replace “XX“, you can find it under Compute Capability column on the above table. | ||
<!--T:50--> | |||
For example, if you are running your code on a Narval A100 node, you find that its Compute Capability is 80, so the correct FLAG to use in the compiler is | For example, if you are running your code on a Narval A100 node, you find that its Compute Capability is 80, so the correct FLAG to use in the compiler is | ||
<!--T:51--> | |||
<pre> | <pre> | ||
-gencode arch=compute_80,code=[sm_80,compute_80] | -gencode arch=compute_80,code=[sm_80,compute_80] | ||
</pre> | </pre> | ||
<!--T:52--> | |||
or the following command to configure CMake: | or the following command to configure CMake: | ||
<!--T:53--> | |||
<pre> | <pre> | ||
cmake .. -DCMAKE_CUDA_ARCHITECTURES=80 | cmake .. -DCMAKE_CUDA_ARCHITECTURES=80 | ||
</pre> | </pre> | ||
<!--T:54--> | |||
[[Category:SLURM]] | [[Category:SLURM]] | ||
</translate> | </translate> |