CUDA: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(move section on Compute Capability from Slurm page)
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 4: Line 4:


<!--T:37-->
<!--T:37-->
"CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs)."<ref>[http://www.nvidia.com/object/cuda_home_new.html Nvidia CUDA Home Page]. CUDA is a registered trademark of NVIDIA.</ref>
"CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs)."<ref>[https://developer.nvidia.com/cuda-toolkit NVIDIA CUDA Home Page]. CUDA is a registered trademark of NVIDIA.</ref>


<!--T:38-->
<!--T:38-->
Line 22: Line 22:


<!--T:42-->
<!--T:42-->
The following program will add two numbers together on a GPU. Save the file as <code>add.cu</code>. ''The <code>cu</code> file extension is important!''.  
The following program will add two numbers together on a GPU. Save the file as <code>add.cu</code>. <i>The <code>cu</code> file extension is important!</i>.  


</translate>
</translate>
Line 69: Line 69:


=== Submitting jobs=== <!--T:44-->
=== Submitting jobs=== <!--T:44-->
To run the program, create a Slurm job script as shown below. Be sure to replace <code>def-someuser</code> with your specific account (see [[Running_jobs#Accounts_and_projects|accounts and projects]]). For options relating to scheduling jobs with GPUs see [[Using GPUs with Slurm]].  
To run the program, create a Slurm job script as shown below. Be sure to replace <code>def-someuser</code> with your specific account (see [[Running_jobs#Accounts_and_projects|Accounts and projects]]). For options relating to scheduling jobs with GPUs see [[Using GPUs with Slurm]].  
{{File
{{File
   |name=gpu_job.sh
   |name=gpu_job.sh
Line 83: Line 83:


<!--T:45-->
<!--T:45-->
Submit your GPU job to the scheduler with this command.
Submit your GPU job to the scheduler with  
<source lang="console">
<source lang="console">
$ sbatch gpu_job.sh
$ sbatch gpu_job.sh
Submitted batch job 3127733
Submitted batch job 3127733
</source>For more information about the <code>sbatch</code> command and running and monitoring jobs see [[Running jobs]].
</source>For more information about the <code>sbatch</code> command and running and monitoring jobs, see [[Running jobs]].


<!--T:46-->
<!--T:46-->
Once your job has finished you should see an output file similar to this.
Once your job has finished, you should see an output file similar to this:
<source lang="console">
<source lang="console">
$ cat slurm-3127733.out
$ cat slurm-3127733.out
2+7=9
2+7=9
</source>
</source>
If you run this without a GPU present you might see output like <code>2+7=0</code>.  
If you run this without a GPU present, you might see output like <code>2+7=0</code>.  


=== Linking libraries === <!--T:48-->
=== Linking libraries === <!--T:48-->
Line 104: Line 104:


<!--T:47-->
<!--T:47-->
To learn more about how the above program works and how to make the use of a GPUs parallelism see [[CUDA tutorial]].
To learn more about how the above program works and how to make the use of GPU parallelism, see [[CUDA tutorial]].


== Troubleshooting ==
== Troubleshooting == <!--T:49-->


=== "Compute Capability" ===
=== Compute capability === <!--T:50-->


NVidia has created a technical term "compute capabilty" which they describe as follows:
<!--T:51-->
NVidia has created this technical term, which they describe as follows:


<!--T:52-->
<blockquote>
<blockquote>
The ''compute capability'' of a device is represented by a version number, also sometimes called its "SM version". This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU."  ([https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability CUDA Toolkit Documentation, section 2.6])
The <i>compute capability</i> of a device is represented by a version number, also sometimes called its "SM version". This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU."  ([https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability CUDA Toolkit Documentation, section 2.6])
</blockquote>
</blockquote>


The following errors are connected with "compute capability":
<!--T:53-->
The following errors are connected with compute capability:


<!--T:54-->
<pre>
<pre>
nvcc fatal : Unsupported gpu architecture 'compute_XX'
nvcc fatal : Unsupported gpu architecture 'compute_XX'
</pre>
</pre>


<!--T:55-->
<pre>
<pre>
no kernel image is available for execution on the device (209)
no kernel image is available for execution on the device (209)
</pre>
</pre>


If you encounter either of these errors, you may be able to fix it by adding the correct FLAG to the <code>nvcc</code> call:
<!--T:56-->
If you encounter either of these errors, you may be able to fix it by adding the correct <i>flag</i> to the <code>nvcc</code> call:


<!--T:57-->
<pre>
<pre>
-gencode arch=compute_XX,code=[sm_XX,compute_XX]
-gencode arch=compute_XX,code=[sm_XX,compute_XX]
</pre>
</pre>


Or if you are using <code>cmake</code> instead of <code>nvcc</code> directly, provide the following flag:
<!--T:58-->
If you are using <code>cmake</code>, provide the following flag:


<!--T:59-->
<pre>
<pre>
cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX
cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX
</pre>
</pre>


where “XX” is the "compute capability" of the Nvidia GPU that you expect to run the application on.  
<!--T:60-->
To find the value to replace “XX“, see the Available Hardware table on the page [[Using GPUs with Slurm]].
where “XX” is the compute capability of the Nvidia GPU that you expect to run the application on.  
To find the value to replace “XX“, see the [[Using GPUs with Slurm#Available_GPUs|Available GPUs table]].


'''For example,''' if you will run your code on a Narval A100 node, its "compute capability" is 80.
<!--T:61-->
The correct FLAG to use when compiling with <code>nvcc</code> is
<b>For example,</b> if you will run your code on a Narval A100 node, its compute capability is 80.
The correct flag to use when compiling with <code>nvcc</code> is


<!--T:62-->
<pre>
<pre>
-gencode arch=compute_80,code=[sm_80,compute_80]
-gencode arch=compute_80,code=[sm_80,compute_80]
</pre>
</pre>


<!--T:63-->
The flag to supply to <code>cmake</code> is:
The flag to supply to <code>cmake</code> is:


<!--T:64-->
<pre>
<pre>
cmake .. -DCMAKE_CUDA_ARCHITECTURES=80
cmake .. -DCMAKE_CUDA_ARCHITECTURES=80

Latest revision as of 21:54, 1 June 2023

Other languages:


"CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs)."[1]

It is reasonable to think of CUDA as a set of libraries and associated C, C++, and Fortran compilers that enable you to write code for GPUs. See OpenACC Tutorial for another set of GPU programming tools.

Quick start guide

Compiling

Here we show a simple example of how to use the CUDA C/C++ language compiler, nvcc, and run code created with it. For a longer tutorial in CUDA programming, see CUDA tutorial.

First, load a CUDA module.

$ module purge
$ module load cuda

The following program will add two numbers together on a GPU. Save the file as add.cu. The cu file extension is important!.


File : add.cu

#include <iostream>

__global__ void add (int *a, int *b, int *c){
  *c = *a + *b;
}

int main(void){
  int a, b, c;
  int *dev_a, *dev_b, *dev_c;
  int size = sizeof(int);
  
  //  allocate device copies of a,b, c
  cudaMalloc ( (void**) &dev_a, size);
  cudaMalloc ( (void**) &dev_b, size);
  cudaMalloc ( (void**) &dev_c, size);
  
  a=2; b=7;
  //  copy inputs to device
  cudaMemcpy (dev_a, &a, size, cudaMemcpyHostToDevice);
  cudaMemcpy (dev_b, &b, size, cudaMemcpyHostToDevice);
  
  // launch add() kernel on GPU, passing parameters
  add <<< 1, 1 >>> (dev_a, dev_b, dev_c);
  
  // copy device result back to host
  cudaMemcpy (&c, dev_c, size, cudaMemcpyDeviceToHost);
  std::cout<<a<<"+"<<b<<"="<<c<<std::endl;
  
  cudaFree ( dev_a ); cudaFree ( dev_b ); cudaFree ( dev_c );
}


Compile the program with nvcc to create an executable named add.

$ nvcc add.cu -o add

Submitting jobs

To run the program, create a Slurm job script as shown below. Be sure to replace def-someuser with your specific account (see Accounts and projects). For options relating to scheduling jobs with GPUs see Using GPUs with Slurm.

File : gpu_job.sh

#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --gres=gpu:1              # Number of GPUs (per node)
#SBATCH --mem=400M                # memory (per node)
#SBATCH --time=0-00:10            # time (DD-HH:MM)
./add #name of your program


Submit your GPU job to the scheduler with

$ sbatch gpu_job.sh
Submitted batch job 3127733

For more information about the sbatch command and running and monitoring jobs, see Running jobs.

Once your job has finished, you should see an output file similar to this:

$ cat slurm-3127733.out
2+7=9

If you run this without a GPU present, you might see output like 2+7=0.

Linking libraries

If you have a program that needs to link some libraries included with CUDA, for example cuBLAS, compile with the following flags

nvcc -lcublas -Xlinker=-rpath,$CUDA_PATH/lib64

To learn more about how the above program works and how to make the use of GPU parallelism, see CUDA tutorial.

Troubleshooting

Compute capability

NVidia has created this technical term, which they describe as follows:

The compute capability of a device is represented by a version number, also sometimes called its "SM version". This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU." (CUDA Toolkit Documentation, section 2.6)

The following errors are connected with compute capability:

nvcc fatal : Unsupported gpu architecture 'compute_XX'
no kernel image is available for execution on the device (209)

If you encounter either of these errors, you may be able to fix it by adding the correct flag to the nvcc call:

-gencode arch=compute_XX,code=[sm_XX,compute_XX]

If you are using cmake, provide the following flag:

cmake .. -DCMAKE_CUDA_ARCHITECTURES=XX

where “XX” is the compute capability of the Nvidia GPU that you expect to run the application on. To find the value to replace “XX“, see the Available GPUs table.

For example, if you will run your code on a Narval A100 node, its compute capability is 80. The correct flag to use when compiling with nvcc is

-gencode arch=compute_80,code=[sm_80,compute_80]

The flag to supply to cmake is:

cmake .. -DCMAKE_CUDA_ARCHITECTURES=80
  1. NVIDIA CUDA Home Page. CUDA is a registered trademark of NVIDIA.