CUDA tutorial: Difference between revisions

Line 190: Line 190:
** highly efficient access for read-only, broadcast
** highly efficient access for read-only, broadcast
* Carefully divide data acording to access patterns:
* Carefully divide data acording to access patterns:
** R Only:   constant memory (very fast if in cashe)
** R Only:   constant memory (very fast if in cache)
** R/W within Block: shared memory (very fast)
** R/W within Block: shared memory (very fast)
** R/W within Thread: registers (very fast)
** R/W within Thread: registers (very fast)
** R/W input/results: global memory (very slow)
** R/W input/results: global memory (very slow)
Bureaucrats, cc_docs_admin, cc_staff
337

edits