CUDA tutorial: Difference between revisions

Jump to navigation Jump to search
Undo revision 47638 by Cgeroux (talk)
(Marked this version for translation)
(Undo revision 47638 by Cgeroux (talk))
Line 145: Line 145:
</syntaxhighlight>
</syntaxhighlight>
<translate>
<translate>
<!--T:28-->
Here we replaced 1 by N, so that N different CUDA blocks will be executed at the same time. However, in order to achieve parallelism we need to make some changes to the kernel as well:
Here we replaced 1 by N, so that N different CUDA blocks will be executed at the same time. However, in order to achieve parallelism we need to make some changes to the kernel as well:
</translate>
</translate>
Line 153: Line 152:
</syntaxhighlight>
</syntaxhighlight>
<translate>
<translate>
<!--T:29-->
where blockIdx.x is the unique number identifying a CUDA block. This way each CUDA block adds a value from a[ ] to b[ ].
where blockIdx.x is the unique number identifying a CUDA block. This way each CUDA block adds a value from a[ ] to b[ ].
[[File:Cuda-blocks-parallel.png|thumbnail|CUDA blocks-based parallelism. ]]
[[File:Cuda-blocks-parallel.png|thumbnail|CUDA blocks-based parallelism. ]]
Line 164: Line 162:
</syntaxhighlight>
</syntaxhighlight>
<translate>
<translate>
<!--T:30-->
Now instead of blocks, the job is distributed across parallel threads. What is the advantage of having parallel threads ? Unlike blocks, threads can communicate between each other: in other words, we parallelize across multiple threads in the block when heavy communication is involved. The chunks of code that can run independently, i.e. with little or no communication, are distributed across parallel blocks.
Now instead of blocks, the job is distributed across parallel threads. What is the advantage of having parallel threads ? Unlike blocks, threads can communicate between each other: in other words, we parallelize across multiple threads in the block when heavy communication is involved. The chunks of code that can run independently, i.e. with little or no communication, are distributed across parallel blocks.


Line 176: Line 173:
</syntaxhighlight>
</syntaxhighlight>
<translate>
<translate>
<!--T:31-->
After each thread computes its portion, we need to add everything together: each thread has to share its data.  However, the problem is that each copy of thread's temp variable is private. This can be resolved by the use of shared memory. Below is the kernel with the modifications to use shared memory:
After each thread computes its portion, we need to add everything together: each thread has to share its data.  However, the problem is that each copy of thread's temp variable is private. This can be resolved by the use of shared memory. Below is the kernel with the modifications to use shared memory:
</translate>
</translate>
cc_staff
1,486

edits

Navigation menu