Bureaucrats, cc_docs_admin, cc_staff
337
edits
No edit summary |
No edit summary |
||
Line 93: | Line 93: | ||
add <<< N, 1 >>> (dev_a, dev_b, dev_c); | add <<< N, 1 >>> (dev_a, dev_b, dev_c); | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Here we replaced 1 by N, so that N different cuda blocks will be executed at the same time. However, in order to achieve a parallelism we need to make some changes to the Kernel as well: | |||
<syntaxhighlight lang="cpp" line highlight="1,5"> | |||
__global__ void add (int *a, int *b, int *c){ | |||
c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x]; | |||
</syntaxhighlight> | |||
where blockIdx.x is the unique number identifying a cuda block. This way each cuda block adds a value from a[ ] to b[ ]. |