CUDA tutorial: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 93: Line 93:
add <<< N, 1 >>> (dev_a, dev_b, dev_c);
add <<< N, 1 >>> (dev_a, dev_b, dev_c);
</syntaxhighlight>
</syntaxhighlight>
Here we replaced 1 by N, so that N different cuda blocks will be executed at the same time. However, in order to achieve a parallelism we need to make some changes to the Kernel as well:
<syntaxhighlight lang="cpp" line highlight="1,5">
__global__  void add (int *a, int *b, int *c){
c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x];
</syntaxhighlight>
where blockIdx.x is the unique number identifying a cuda block. This way each cuda block adds a value from a[ ] to b[ ].
Bureaucrats, cc_docs_admin, cc_staff
337

edits