Hyper-Q / MPS: Difference between revisions

Revision as of 19:38, 20 November 2023

This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.

Overview[edit]

Hyper-Q (or MPS) is a new hardware/software feature of NVIDIA GPUs. It is available in GPUs with CUDA capability 3.5 and higher. It is available on P100 and newer GPUs on the Alliance clusters cedar, graham, beluga, and narval.

According to NVIDIA,

MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code.

In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.

MPS is not enabled by default, but it is straightforward to do. If you use the GPU interactively, execute the following commands before running your CUDA code(s):

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
nvidia-cuda-mps-control -d

If you are using a scheduler, you should submit a script which contains the above lines, and then executes your code.

Then you can avail the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run an MPI/CUDA, OpenMP/CUDA code, or multiple instances of a serial CUDA code (GPU farming).

Many additional details on MPS can be found in this document: Multi Process Service (MPS) - NVIDIA Documentation.

@@ Line 2: / Line 2: @@
 ==Overview==
-Hyper-Q (or MPS) is a new hardware/software feature of NVIDIA GPUs. It is available in GPUs with CUDA capability 3.5 and higher. It is available on P100 and V100 GPUs in the Compute Canada clusters cedar, graham, and beluga.
+Hyper-Q (or MPS) is a new hardware/software feature of NVIDIA GPUs. It is available in GPUs with CUDA capability 3.5 and higher. It is available on P100 and newer GPUs on the Alliance clusters cedar, graham, beluga, and narval.
 According to NVIDIA,
- '''Hyper-Q''' / MPS enables multiple CPU cores to launch work on a single GPU
+MPS (Multi-Process Service; formerly known as Hyper-Q) enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU idle times. MPS increases the total number of connections (work queues) between the host and the  GPU by allowing multiple simultaneous, hardware-managed connections (compared to the single connection available with Fermi generation GPUs). Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process. Applications that previously encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see up to dramatic performance increase without changing any existing code.
- simultaneously, thereby dramatically increasing GPU utilization and significantly reducing CPU
- idle times. Hyper-Q increases the total number of connections (work queues) between the host
- and the GK110 GPU by allowing 32 simultaneous, hardware-managed connections (compared to
- the single connection available with Fermi). Hyper-Q is a flexible solution that allows separate
- connections from multiple CUDA streams, from multiple Message Passing Interface (MPI)
- processes, or even from multiple threads within a process. Applications that previously
- encountered false serialization across tasks, thereby limiting achieved GPU utilization, can see
- up to dramatic performance increase without changing any existing code.
-In our tests, Hyper-Q increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that Hyper-Q is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores (like K20).
+In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.
-Hyper-Q is not enabled by default, but it is straightforward to do. If you use the GPU interactively, execute the following commands before running your CUDA code(s):
+MPS is not enabled by default, but it is straightforward to do. If you use the GPU interactively, execute the following commands before running your CUDA code(s):
- export CUDA_VISIBLE_DEVICES=0
   export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
   export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
@@ Line 27: / Line 18: @@
 If you are using a scheduler, you should submit a script which contains the above lines, and then executes your code.
-Then you can avail the Hyper-Q feature if you have more than one CPU thread accessing the GPU. This will happen if you run an MPI/CUDA, OpenMP/CUDA code, or multiple instances of a serial CUDA code (GPU farming).
+Then you can avail the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run an MPI/CUDA, OpenMP/CUDA code, or multiple instances of a serial CUDA code (GPU farming).
-Many additional details on Hyper-Q can be found in this document: [https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCQQFjAB&url=https%3A%2F%2Fdocs.nvidia.com%2Fdeploy%2Fpdf%2FCUDA_Multi_Process_Service_Overview.pdf&ei=M4bvVMbJNYzasATz6YGoAg&usg=AFQjCNGDmPf6wg7ne0F4RoqT0WOOEmxGlg&sig2=dGy5ZeTxawO1bXmtSJySNg|CUDA Multi Process Service (MPS) - NVIDIA Documentation].
+Many additional details on MPS can be found in this document: [https://docs.nvidia.com/deploy/mps/index.html|CUDA Multi Process Service (MPS) - NVIDIA Documentation].
 [[Category:Software]]

Hyper-Q / MPS: Difference between revisions

Revision as of 19:38, 20 November 2023

Overview[edit]

Navigation menu

Search