CUDA tutorial: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 60: Line 60:
3. Copy results from GPU memory back to CPU memory
3. Copy results from GPU memory back to CPU memory


= CUDA Block-Threading Model = <!--T:3-->
= CUDA block-threading model = <!--T:3-->


<!--T:4-->
<!--T:4-->
[[File:Cuda-threads-blocks.png|thumbnail|CUDA block-threading model where threads are organized into blocks while blocks are further organized into grid. ]]
[[File:Cuda-threads-blocks.png|thumbnail|CUDA block-threading model where threads are organized into blocks while blocks are further organized into grid. ]]
Given very large number of threads (and in order to achieve massive parallelism one has to use all the threads possible) in CUDA kernel, one needs to organize them somehow. in CUDA, all the threads are structured in threading blocks, the blocks are further organized into grids, as shown on FIg. In dividing the threads we make sure that the following is satisfied:
Given a very large number of threads - in order to achieve massive parallelism one has to use all the threads possible - in a CUDA kernel, one needs to organize them somehow. In CUDA, all the threads are structured in threading blocks, the blocks are further organized into grids, as shown in the accompanying figure. In distributing the threads we must make sure that the following conditions are satisfied:
* threads within a block cooperate via the shared memory
* threads within a block cooperate via the shared memory
* threads in different blocks can not cooperate
* threads in different blocks can not cooperate
In this model the threads within a block work on the same set of instructions (but perhaps with different data sets) and exchange data between each other via shared memory. Threads in other blocks do the same thing (see Figure).  
In this model the threads within a block work on the same set of instructions (but perhaps with different data sets) and exchange data between each other via shared memory. Threads in other blocks do the same thing (see the figure).  
[[File:Cuda_threads.png|thumbnail|Threads within a block intercommunicate via shared memory . ]]
[[File:Cuda_threads.png|thumbnail|Threads within a block intercommunicate via shared memory. ]]


<!--T:5-->
<!--T:5-->
Line 74: Line 74:
* Block IDs: 1D or 2D (blockIdx.x, blockIdx.y)
* Block IDs: 1D or 2D (blockIdx.x, blockIdx.y)
* Thread IDs: 1D, 2D, or 3D (threadIdx.x, threadIdx.y, threadIdx.z)
* Thread IDs: 1D, 2D, or 3D (threadIdx.x, threadIdx.y, threadIdx.z)
Such model simplifies memory addressing when processing multidimmensional data.
Such a model simplifies memory addressing when processing multi-dimensional data.


= Thread scheduling = <!--T:6-->
= Thread scheduling = <!--T:6-->
Bureaucrats, cc_docs_admin, cc_staff
2,318

edits