Bureaucrats, cc_docs_admin, cc_staff
2,314
edits
(Marked this version for translation) |
|||
Line 21: | Line 21: | ||
However, in the recent years, such capability is being harnessed more broadly to accelerate computational workloads of the cutting-edge scientific research areas. | However, in the recent years, such capability is being harnessed more broadly to accelerate computational workloads of the cutting-edge scientific research areas. | ||
=What is CUDA?= | =What is CUDA?= <!--T:23--> | ||
'''CUDA''' = '''C'''ompute '''U'''nified '''D'''evice '''A'''rchitecture | '''CUDA''' = '''C'''ompute '''U'''nified '''D'''evice '''A'''rchitecture | ||
Provides access to instructions and memory of massively parallel elements in a GPU. | Provides access to instructions and memory of massively parallel elements in a GPU. | ||
Another definition: CUDA is a scalable parallel programming model and software environment for parallel computing. | Another definition: CUDA is a scalable parallel programming model and software environment for parallel computing. | ||
=CUDA GPU Architecture = | =CUDA GPU Architecture = <!--T:24--> | ||
There two main components of the GPU: | There two main components of the GPU: | ||
* Global memory | * Global memory | ||
Line 40: | Line 40: | ||
*Device – The GPU and its memory (device memory) | *Device – The GPU and its memory (device memory) | ||
<!--T:25--> | |||
The CUDA programming model is a heterogeneous model in which both the CPU and GPU are used. | The CUDA programming model is a heterogeneous model in which both the CPU and GPU are used. | ||
CUDA code is capable of managing memory of both the CPU and the GPU as well as executing GPU functions, called kernels. Such kernels are executed by many GPU threads in parallel. Here is a five step recipe for a typical CUDA code: | CUDA code is capable of managing memory of both the CPU and the GPU as well as executing GPU functions, called kernels. Such kernels are executed by many GPU threads in parallel. Here is a five step recipe for a typical CUDA code: | ||
Line 184: | Line 185: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
= Basic performance considerations = | = Basic performance considerations = <!--T:26--> | ||
== Memory transfers == | == Memory transfers == | ||
* PCI-e is extremely slow (4-6 GB/s) compared to both host and device memories | * PCI-e is extremely slow (4-6 GB/s) compared to both host and device memories |