CUDA tutorial: Difference between revisions

CUDA tutorial (view source)

48 bytes added , 6 years ago

Undo revision 47667 by Cgeroux (talk)

cc_staff

1,486

edits

@@ Line 145: / Line 145: @@
 </syntaxhighlight>
 <translate>
+<!--T:28-->
 Here we replaced 1 by N, so that N different CUDA blocks will be executed at the same time. However, in order to achieve parallelism we need to make some changes to the kernel as well:
 </translate>
@@ Line 152: / Line 153: @@
 </syntaxhighlight>
 <translate>
+<!--T:29-->
 where blockIdx.x is the unique number identifying a CUDA block. This way each CUDA block adds a value from a[ ] to b[ ].
 [[File:Cuda-blocks-parallel.png|thumbnail|CUDA blocks-based parallelism. ]]
@@ Line 162: / Line 164: @@
 </syntaxhighlight>
 <translate>
+<!--T:30-->
 Now instead of blocks, the job is distributed across parallel threads. What is the advantage of having parallel threads ? Unlike blocks, threads can communicate between each other: in other words, we parallelize across multiple threads in the block when heavy communication is involved. The chunks of code that can run independently, i.e. with little or no communication, are distributed across parallel blocks.
@@ Line 173: / Line 176: @@
 </syntaxhighlight>
 <translate>
+<!--T:31-->
 After each thread computes its portion, we need to add everything together: each thread has to share its data.  However, the problem is that each copy of thread's temp variable is private. This can be resolved by the use of shared memory. Below is the kernel with the modifications to use shared memory:
 </translate>