Bureaucrats, cc_docs_admin, cc_staff
2,318
edits
No edit summary |
|||
Line 48: | Line 48: | ||
* Transfer data back to the host memory | * Transfer data back to the host memory | ||
=CUDA | =CUDA execution model= <!--T:2--> | ||
Simple CUDA code executed on GPU is called | Simple CUDA code executed on GPU is called a ''kernel''. There are several questions we may ask at this point: | ||
* How do you run a | * How do you run a kernel on a bunch of streaming multiprocessors (SMs)? | ||
* How do you make such run massively parallel ? | * How do you make such kernel run in a massively parallel fashion? | ||
Here is the execution recipe that will answer the above questions: | Here is the execution recipe that will answer the above questions: | ||
* each GPU core (streaming processor) | * each GPU core (streaming processor) executes a sequential '''thread''', where a '''thread''' is a smallest set of instructions handled by the operating system's schedule. | ||
* all GPU cores execute the kernel in a SIMT fashion (Single Instruction Multiple Threads) | * all GPU cores execute the kernel in a SIMT fashion (Single Instruction, Multiple Threads) | ||
Usually the following procedure is recommended when it comes to executing on GPU: | Usually the following procedure is recommended when it comes to executing on GPU: | ||
1. Copy input data from CPU memory to GPU memory | 1. Copy input data from CPU memory to GPU memory | ||
2. Load GPU program ( | 2. Load GPU program (kernel) and execute it | ||
3. Copy results from GPU memory back to CPU memory | 3. Copy results from GPU memory back to CPU memory | ||