38,760
edits
(Updating to match new version of source page) |
(Updating to match new version of source page) |
||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
[[Category:Software]] | [[Category:Software]] | ||
=Quick start guide= | |||
To begin working with CUDA load a CUDA module. | |||
<source lang="console"> | |||
$ module purge | |||
$ module load cuda | |||
</source> | |||
As a first step we will add two numbers together on a GPU. Save the below file as <code>add.cu</code>. '''The <code>cu</code> file extension is important!'''. | |||
{{File | |||
|name=add.cu | |||
|lang="c++" | |||
|contents= | |||
#include <iostream> | |||
__global__ void add (int *a, int *b, int *c){ | |||
*c = *a + *b; | |||
} | |||
int main(void){ | |||
int a, b, c; | |||
int *dev_a, *dev_b, *dev_c; | |||
int size = sizeof(int); | |||
// allocate device copies of a,b, c | |||
cudaMalloc ( (void**) &dev_a, size); | |||
cudaMalloc ( (void**) &dev_b, size); | |||
cudaMalloc ( (void**) &dev_c, size); | |||
a=2; b=7; | |||
// copy inputs to device | |||
cudaMemcpy (dev_a, &a, size, cudaMemcpyHostToDevice); | |||
cudaMemcpy (dev_b, &b, size, cudaMemcpyHostToDevice); | |||
// launch add() kernel on GPU, passing parameters | |||
add <<< 1, 1 >>> (dev_a, dev_b, dev_c); | |||
// copy device result back to host | |||
cudaMemcpy (&c, dev_c, size, cudaMemcpyDeviceToHost); | |||
std::cout<<a<<"+"<<b<<"="<<c<<std::endl; | |||
cudaFree ( dev_a ); cudaFree ( dev_b ); cudaFree ( dev_c ); | |||
} | |||
}} | |||
To build the program use the command below, which will create an executable named <code>add</code>. | |||
<source lang="console"> | |||
$ nvcc add.cu -o add | |||
</source> | |||
To run the program first create a Slurm job script called gpu_job.sh. Be sure to replace <code>def-someuser</code> with your specific account (see [[Running_jobs#Accounts_and_projects|accounts and projects]]). For various ways to schedule jobs with GPUs see [[Using GPUs with Slurm|using GPUs with Slurm]]. | |||
{{File | |||
|name=gpu_job.sh | |||
|lang="sh" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH --account=def-someuser | |||
#SBATCH --gres=gpu:1 # Number of GPUs (per node) | |||
#SBATCH --mem=400M # memory (per node) | |||
#SBATCH --time=0-00:10 # time (DD-HH:MM) | |||
./add #name of your program | |||
}} | |||
Submit your GPU job to the scheduler with this command. | |||
<source lang="console"> | |||
$ sbatch gpu_job.sh | |||
Submitted batch job 3127733 | |||
</source>For information about the <code>sbatch</code> command and running and monitoring jobs see the [[Running jobs|running jobs]] page. | |||
Once your job has finished you should see an output file similar to this. | |||
<source lang="console"> | |||
$ cat slurm-3127733.out | |||
2+7=9 | |||
</source> | |||
If you run this without a GPU present you might see output like <code>2+7=0</code>. To learn more about how the above program works and how to make the use of a GPUs parallelism keep reading. | |||
=Introduction= | =Introduction= | ||
This tutorial introduces the graphics processing unit (GPU) as a massively parallel computing device; the CUDA parallel programming language; and some of the CUDA numerical libraries for high performance computing. | This tutorial introduces the graphics processing unit (GPU) as a massively parallel computing device; the CUDA parallel programming language; and some of the CUDA numerical libraries for high performance computing. |