CUDA tutorial: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 113: Line 113:


= Advantage of Shared Memory=
= Advantage of Shared Memory=
So far all the memory transfers in the kernel have been done via the regular GPU (global) memory which is relatively slow. Often time we have so many communications between the threads that decreases the performance significantly.  In order to address this issue there exist another type of memory called Shared memory which can be used to speed-up the memory operations between the threads.
So far all the memory transfers in the kernel have been done via the regular GPU (global) memory which is relatively slow. Often time we have so many communications between the threads that decreases the performance significantly.  In order to address this issue there exist another type of memory called Shared memory which can be used to speed-up the memory operations between the threads. However the trick is that only the threads within a block can communicate.  In order to demonstrate the usage of such shared memory we consider the dot product example where two vectors are dot-multipled. Below is the kernel:
<syntaxhighlight lang="cpp" line highlight="1,5">
__global__  void dot(int *a, int *b, int *c){
        int temp = a[threadIdx.x]*b[threadIdx.x];
}
</syntaxhighlight>
Bureaucrats, cc_docs_admin, cc_staff
337

edits