CUDA tutorial: Difference between revisions

no edit summary
No edit summary
Line 48: Line 48:
* Transfer data back to the host memory
* Transfer data back to the host memory


=CUDA Execution Model= <!--T:2-->
=CUDA execution model= <!--T:2-->
Simple CUDA code executed on GPU is called KERNEL. There are several questions we may ask at this point:
Simple CUDA code executed on GPU is called a ''kernel''. There are several questions we may ask at this point:
* How do you run a Kernel on a bunch of streaming multiprocessors (SMs) ?
* How do you run a kernel on a bunch of streaming multiprocessors (SMs)?
* How do you make such run massively parallel ?
* How do you make such kernel run in a massively parallel fashion?
Here is the execution recipe that will answer the above questions:
Here is the execution recipe that will answer the above questions:
* each GPU core (streaming processor) execute a sequential '''Thread''', where '''Thread''' is a smallest set of instructions handled by the operating system's schedule.
* each GPU core (streaming processor) executes a sequential '''thread''', where a '''thread''' is a smallest set of instructions handled by the operating system's schedule.
* all GPU cores execute the kernel in a SIMT fashion (Single Instruction Multiple Threads)
* all GPU cores execute the kernel in a SIMT fashion (Single Instruction, Multiple Threads)
Usually the following procedure is recommended when it comes to executing on GPU:
Usually the following procedure is recommended when it comes to executing on GPU:
1. Copy input data from CPU memory to GPU memory
1. Copy input data from CPU memory to GPU memory
2. Load GPU program (Kernel) and execute it
2. Load GPU program (kernel) and execute it
3. Copy results from GPU memory back to CPU memory
3. Copy results from GPU memory back to CPU memory


Bureaucrats, cc_docs_admin, cc_staff
2,318

edits