Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
No edit summary |
(copy editing) |
||
Line 2: | Line 2: | ||
==Overview== | ==Overview== | ||
Hyper-Q (or MPS) is a | Hyper-Q (or MPS) is a feature of NVIDIA GPUs. | ||
It is available in GPUs with CUDA compute capability 3.5 and higher,<ref>For a table relating NVIDIA GPU model names, architecture names, and CUDA compute capabilties, see [https://en.wikipedia.org/wiki/Nvidia_Tesla https://en.wikipedia.org/wiki/Nvidia_Tesla]</ref> | |||
which is all GPUs currently deployed on Alliance general-purpose clusters (Cedar, Graham, Béluga, and Narval). | |||
According to NVIDIA, | According to NVIDIA, | ||
<blockquote> | |||
MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code. | MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code. | ||
</blockquote> | |||
In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores. | In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores. |