Bureaucrats, cc_docs_admin, cc_staff
337
edits
No edit summary |
No edit summary |
||
Line 31: | Line 31: | ||
** Accessible by both CPU and GPU | ** Accessible by both CPU and GPU | ||
*Streaming multiprocessors (SMs) | *Streaming multiprocessors (SMs) | ||
** Each SM consists or many streaming processors (SPs) | |||
**They perform actual computations | **They perform actual computations | ||
**Each SM has its own control init, registers, execution pipelines, etc | **Each SM has its own control init, registers, execution pipelines, etc | ||
Line 46: | Line 47: | ||
* Transfer data back to the Host memory | * Transfer data back to the Host memory | ||
=CUDA Execution Model= | =CUDA Execution Model= | ||
Simple CUDA code executed on GPU is called KERNEL. There are several questions we may ask at this point: | |||
* How do you run a Kernel on a bunch of streaming multiprocessors (SMs) ? | |||
* How do you make such run massively parallel ? | |||
Here is the execution recipe that will answer the above questions: | |||
* each GPU core (streaming processor) execute a sequential '''Thread''', where '''Thread''' is a smallest set of instructions handled by the operating system's schedule. | |||
* all GPU cores execute the kernel in a SIMT fashion (Single Instruction Multiple Threads) |