Hyper-Q / MPS: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(→‎Overview: copy editing)
Line 4: Line 4:
Hyper-Q (or MPS) is a feature of NVIDIA GPUs.
Hyper-Q (or MPS) is a feature of NVIDIA GPUs.
It is available in GPUs with CUDA compute capability 3.5 and higher,<ref>For a table relating NVIDIA GPU model names, architecture names, and CUDA compute capabilties, see [https://en.wikipedia.org/wiki/Nvidia_Tesla https://en.wikipedia.org/wiki/Nvidia_Tesla]</ref>
It is available in GPUs with CUDA compute capability 3.5 and higher,<ref>For a table relating NVIDIA GPU model names, architecture names, and CUDA compute capabilties, see [https://en.wikipedia.org/wiki/Nvidia_Tesla https://en.wikipedia.org/wiki/Nvidia_Tesla]</ref>
which is all GPUs currently deployed on Alliance general-purpose clusters (Cedar, Graham, Béluga, and Narval).
which is all GPUs currently deployed on Alliance general-purpose clusters (Béluga, Cedar, Graham, and Narval).


[https://docs.nvidia.com/deploy/mps/index.html According to NVIDIA],
[https://docs.nvidia.com/deploy/mps/index.html According to NVIDIA],
Line 11: Line 11:
</blockquote>
</blockquote>


In our tests, MPS increases the total GPU flop rate even when the GPU is being shared by unrelated CPU processes ("GPU farming"). That means that MPS is great for CUDA codes with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.  
In our tests, MPS may increase the total GPU flop rate even when the GPU is being shared by unrelated CPU processes. That means that MPS is great for CUDA applications with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.  


MPS is not enabled by default, but it is straightforward to do. If you use the GPU interactively, execute the following commands before running your CUDA code(s):
MPS is not enabled by default, but it is straightforward to do. Execute the following commands before running your CUDA application:


  export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
  export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
Line 19: Line 19:
  nvidia-cuda-mps-control -d
  nvidia-cuda-mps-control -d


If you are using a scheduler, you should submit a script which contains the above lines, and then executes your code.
Then you can use the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run a hybrid MPI/CUDA application, a hybrid OpenMP/CUDA application, or multiple instances of a serial CUDA application ("GPU farming").


Then you can avail the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run an MPI/CUDA, OpenMP/CUDA code, or multiple instances of a serial CUDA code (GPU farming).
Additional details on MPS can be found here: [https://docs.nvidia.com/deploy/mps/index.html CUDA Multi Process Service (MPS) - NVIDIA Documentation].
 
Many additional details on MPS can be found in this document: [https://docs.nvidia.com/deploy/mps/index.html CUDA Multi Process Service (MPS) - NVIDIA Documentation].


==GPU farming==
==GPU farming==

Revision as of 21:25, 20 November 2023


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Overview[edit]

Hyper-Q (or MPS) is a feature of NVIDIA GPUs. It is available in GPUs with CUDA compute capability 3.5 and higher,[1] which is all GPUs currently deployed on Alliance general-purpose clusters (Béluga, Cedar, Graham, and Narval).

According to NVIDIA,

The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler and later) GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process.

In our tests, MPS may increase the total GPU flop rate even when the GPU is being shared by unrelated CPU processes. That means that MPS is great for CUDA applications with relatively small problem sizes, which on their own cannot efficiently saturate modern GPUs with thousands of cores.

MPS is not enabled by default, but it is straightforward to do. Execute the following commands before running your CUDA application:

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
nvidia-cuda-mps-control -d

Then you can use the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run a hybrid MPI/CUDA application, a hybrid OpenMP/CUDA application, or multiple instances of a serial CUDA application ("GPU farming").

Additional details on MPS can be found here: CUDA Multi Process Service (MPS) - NVIDIA Documentation.

GPU farming[edit]

One situation when the MPS feature can be very useful is when you need to run multiple instances of your CUDA code, when your code is too small to saturate a modern GPU. What you can do is to run multiple instances of your code sharing a single GPU. (This will work as long as there is enough of GPU memory for all of your code instances.) In many cases this should result in a significantly increased collective throughput from all of your GPU processes.

Here is an example of a job script to set up GPU farming:

#!/bin/bash
#SBATCH --gpus-per-node=v100:1
#SBATCH -t 0-10:00
#SBATCH --mem=64G
#SBATCH -c 8

mkdir -p $HOME/tmp
export CUDA_MPS_LOG_DIRECTORY=$HOME/tmp
nvidia-cuda-mps-control -d

for ((i=0; i<8; i++))
 do
 echo $i
 ./my_code $i  &
 done

wait

In the above example, we are sharing a single V100 gpu between 8 instances of "my_code" (which takes a single argument - the loop index $i). We request 8 CPU cores (#SBATCH -c 8) for the farm, so there is one CPU core per code instance. The two important elements are "&" on the code execution line (this sends the code processes to the background), and the "wait" command at the end of the script (which ensures that the job runs until all background processes finished running.)

  1. For a table relating NVIDIA GPU model names, architecture names, and CUDA compute capabilties, see https://en.wikipedia.org/wiki/Nvidia_Tesla