CUDA tutorial: Difference between revisions

CUDA tutorial (view source)

396 bytes added , 7 years ago

no edit summary

Bureaucrats, cc_docs_admin, cc_staff

337

edits

@@ Line 113: / Line 113: @@
 = Advantage of Shared Memory=
-So far all the memory transfers in the kernel have been done via the regular GPU (global) memory which is relatively slow. Often time we have so many communications between the threads that decreases the performance significantly.  In order to address this issue there exist another type of memory called Shared memory which can be used to speed-up the memory operations between the threads.
+So far all the memory transfers in the kernel have been done via the regular GPU (global) memory which is relatively slow. Often time we have so many communications between the threads that decreases the performance significantly.  In order to address this issue there exist another type of memory called Shared memory which can be used to speed-up the memory operations between the threads. However the trick is that only the threads within a block can communicate.  In order to demonstrate the usage of such shared memory we consider the dot product example where two vectors are dot-multipled. Below is the kernel:
+ <syntaxhighlight lang="cpp" line highlight="1,5">
+__global__   void dot(int *a, int *b, int *c){
+        int temp = a[threadIdx.x]*b[threadIdx.x];
+}
+</syntaxhighlight>