Hyper-Q / MPS: Difference between revisions

Jump to navigation Jump to search
copy editing
No edit summary
(copy editing)
Line 2: Line 2:
==Overview==
==Overview==


Hyper-Q (or MPS) is a new hardware/software feature of NVIDIA GPUs. It is available in GPUs with CUDA capability 3.5 and higher. It is available on P100 and newer GPUs on the Alliance clusters cedar, graham, beluga, and narval.
Hyper-Q (or MPS) is a feature of NVIDIA GPUs.
It is available in GPUs with CUDA compute capability 3.5 and higher,<ref>For a table relating NVIDIA GPU model names, architecture names, and CUDA compute capabilties, see [https://en.wikipedia.org/wiki/Nvidia_Tesla https://en.wikipedia.org/wiki/Nvidia_Tesla]</ref>
which is all GPUs currently deployed on Alliance general-purpose clusters (Cedar, Graham, Béluga, and Narval).


According to NVIDIA,
According to NVIDIA,
 
<blockquote>
MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the  GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code.
MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the  GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code.
</blockquote>


In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.  
In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.  
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu