|
|
Line 2: |
Line 2: |
| <translate> | | <translate> |
| [[Category:Software]] | | [[Category:Software]] |
| =Quick start guide= <!--T:32-->
| |
| To begin working with CUDA load a CUDA module.
| |
| <source lang="console">
| |
| $ module purge
| |
| $ module load cuda
| |
| </source>
| |
| As a first step we will add two numbers together on a GPU. Save the below file as <code>add.cu</code>. '''The <code>cu</code> file extension is important!'''.
| |
| {{File
| |
| |name=add.cu
| |
| |lang="c++"
| |
| |contents=
| |
| #include <iostream>
| |
|
| |
| <!--T:33-->
| |
| __global__ void add (int *a, int *b, int *c){
| |
| *c = *a + *b;
| |
| }
| |
|
| |
| <!--T:34-->
| |
| int main(void){
| |
| int a, b, c;
| |
| int *dev_a, *dev_b, *dev_c;
| |
| int size = sizeof(int);
| |
|
| |
| // allocate device copies of a,b, c
| |
| cudaMalloc ( (void**) &dev_a, size);
| |
| cudaMalloc ( (void**) &dev_b, size);
| |
| cudaMalloc ( (void**) &dev_c, size);
| |
|
| |
| a=2; b=7;
| |
| // copy inputs to device
| |
| cudaMemcpy (dev_a, &a, size, cudaMemcpyHostToDevice);
| |
| cudaMemcpy (dev_b, &b, size, cudaMemcpyHostToDevice);
| |
|
| |
| // launch add() kernel on GPU, passing parameters
| |
| add <<< 1, 1 >>> (dev_a, dev_b, dev_c);
| |
|
| |
| // copy device result back to host
| |
| cudaMemcpy (&c, dev_c, size, cudaMemcpyDeviceToHost);
| |
| std::cout<<a<<"+"<<b<<"="<<c<<std::endl;
| |
|
| |
| cudaFree ( dev_a ); cudaFree ( dev_b ); cudaFree ( dev_c );
| |
| }
| |
| }}
| |
| To build the program use the command below, which will create an executable named <code>add</code>.
| |
| <source lang="console">
| |
| $ nvcc add.cu -o add
| |
| </source>
| |
| To run the program first create a Slurm job script called gpu_job.sh. Be sure to replace <code>def-someuser</code> with your specific account (see [[Running_jobs#Accounts_and_projects|accounts and projects]]). For various ways to schedule jobs with GPUs see [[Using GPUs with Slurm|using GPUs with Slurm]].
| |
| {{File
| |
| |name=gpu_job.sh
| |
| |lang="sh"
| |
| |contents=
| |
| #!/bin/bash
| |
| #SBATCH --account=def-someuser
| |
| #SBATCH --gres=gpu:1 # Number of GPUs (per node)
| |
| #SBATCH --mem=400M # memory (per node)
| |
| #SBATCH --time=0-00:10 # time (DD-HH:MM)
| |
| ./add #name of your program
| |
| }}
| |
|
| |
| <!--T:35-->
| |
| Submit your GPU job to the scheduler with this command.
| |
| <source lang="console">
| |
| $ sbatch gpu_job.sh
| |
| Submitted batch job 3127733
| |
| </source>For information about the <code>sbatch</code> command and running and monitoring jobs see the [[Running jobs|running jobs]] page.
| |
|
| |
| <!--T:36-->
| |
| Once your job has finished you should see an output file similar to this.
| |
| <source lang="console">
| |
| $ cat slurm-3127733.out
| |
| 2+7=9
| |
| </source>
| |
| If you run this without a GPU present you might see output like <code>2+7=0</code>. To learn more about how the above program works and how to make the use of a GPUs parallelism keep reading.
| |
|
| |
|
| =Introduction= <!--T:1--> | | =Introduction= <!--T:1--> |
| This tutorial introduces the graphics processing unit (GPU) as a massively parallel computing device; the CUDA parallel programming language; and some of the CUDA numerical libraries for high performance computing. | | This tutorial introduces the graphics processing unit (GPU) as a massively parallel computing device; the [[CUDA]] parallel programming language; and some of the CUDA numerical libraries for high performance computing. |
| {{Prerequisites | | {{Prerequisites |
| |title=Prerequisites | | |title=Prerequisites |