CUDA tutorial: Difference between revisions

CUDA tutorial (view source)

Revision as of 18:32, 28 September 2017

9 bytes added , 7 years ago

no edit summary

Stubbsda

Bureaucrats, cc_docs_admin, cc_staff

2,318

edits

@@ Line 76: / Line 76: @@
 Such model simplifies memory addressing when processing multidimmensional data.
-= Threads Scheduling = <!--T:6-->
+= Thread scheduling = <!--T:6-->
-Usually streaming microprocessor (SM) executes one threading block at a time. The code is executed in groups of 32 threads (called Warps). A hardware scheduller is free to assign blocks to any SM at any time. Furthermore, when SM gets the block assigned to it, it does not mean that this particular block will be executed non-stop. In fact, the scheduler can postpone/suspend execution os such block under certain conditions when e.x. data becomes unavailable (indeed, it takes quite some time to read data from the global GPU memory). When it happens, the scheduler takes another threading block which is ready for execution. This is a so called zero-overhead scheduling which makes the execution more stream-lined where SMs are not idling.
+Usually a streaming microprocessor (SM) executes one threading block at a time. The code is executed in groups of 32 threads (called warps). A hardware scheduller is free to assign blocks to any SM at any time. Furthermore, when an SM gets the block assigned to it, it does not mean that this particular block will be executed non-stop. In fact, the scheduler can postpone/suspend execution of such blocks under certain conditions when e.g. data becomes unavailable (indeed, it is quite time-consuming to read data from the global GPU memory). When it happens, the scheduler executes another threading block which is ready for execution. This is a so-called zero-overhead scheduling which makes the execution more streamlined so that SMs are not idle.
 = GPU Memories in CUDA = <!--T:7-->