Multi-Instance GPU
Introduction
Many programs are unable to fully use modern GPUs such as NVidia A100 and H100. Multi-Instance GPU (MIG) is a technology that allows partitioning a single GPU into multiple GPU instances, thus making each instance a completely independent GPU. Each of the multiple GPU instances would then have a certain slice of the GPU's computational resources and memory, all detached from the other instances by on-chip protections.
GPU instances can be less wasteful and their usage is billed accordingly. Jobs submitted on one of those instances will use less of your allocated priority compared to a full GPU. You will be able to execute more jobs and have shorter wait time.
Which jobs should use GPU instances instead of full GPUs?
Jobs that use less than half of the computing power of a GPU and less than half of the available GPU memory should be evaluated and tested on a GPU instance. In most cases, these jobs will run just as fast on a GPU instance and consume less than half of the computing resource.
Limitations
GPU instances do not support the CUDA Inter-Process Communication (IPC), which optimises data transfers between GPUs over NVLink and NVSwitch. This limitation also affects communications between GPU instances in a single GPU. Consequently, launching an executable on more than one GPU instance at a time does not improve performance and should be avoided.
GPU jobs requiring many CPU cores may also require a full GPU instead of a GPU instance. The maximum number of CPU cores per GPU instance depends on the number of cores per full GPU and on the configured MIG profiles. Both factors may vary between clusters and also between GPU nodes in a cluster.
Available configurations
As of July 30 2024, MIGs are available at this time on the Narval cluster with the A100s GPUs. While there are many possible configurations for MIGs, the following configurations are currently activated on Narval:
- MIG 3g.20gb
- MIG 4g.20gb
The name describes the size of the MIG: 3g.20gb has 20GB of GPU RAM and offers 3/8 of the computing performance of a full A100 GPU. Using less powerful MIGs will have a lower impact on your allocation and priority.
The recommended maximum number of cores per MIG on Narval are:
- 3g.20gb: maximum 6 cores
- 4g.20gb: maximum 6 cores
To request a MIG of certain flavor, you must include this line in your job submission script:
--gres=gpu:a100_3g.20gb:1 --gres=gpu:a100_4g.20gb:1
Examples
Request 1 MIG of power 3/8 and size 20GB for a 1-hour interactive job:
salloc --time=1:0:0 --nodes=1 --ntasks-per-node=1 --cpus-per-task=2 --gres=gpu:a100_3g.20gb:1 --mem=20gb --account=def-someuser
Request 1 MIG of power 4/8 and size 20GB for a 24-hour batch script using the maximum recommended number of cores and system memory.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gres=gpu:a100_4g.20gb:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6 # There are 6 CPU cores per 3g.20gb and 4g.20gb on Narval.
#SBATCH --mem=40gb # Request double the system memory than MIG memory
#SBATCH --time=24:00:00
#SBATCH --account=def-someuser
hostname
nvidia-smi
Finding which of your jobs to migrate to using a MIG
You can find information on current and past jobs on the Narval usage portal, under the Job stats
tab.
Power consumption is a good indicator of the total computing power requested from the GPU. For instance, the following job requested a full A100 GPU with a maximum TDP of 400W, but only used 100W on average, which is only 50W more than the idle electric consumption:
GPU functionality utilization may also provide insights on the usage of the GPU in cases where the power consumption is not sufficient. For this example job, GPU utilization graph supports the conclusion of the GPU power consumption graph that the job use less than 25% of the available power of a full A100 GPU:
The final things to consider is the maximum amount of GPU memory and average number of CPU cores required to run the job. For this example, the job uses a maximum of 3GB of GPU memory out of the 40GB of a full A100 GPU.
It was also launched using a single CPU core. When taking into account these 3 metrics, we see that the job could easily run on a 3g.20GB or 4g.20GB MIG with power and memory to spare.
Another way to monitor the usage of a running job is by attaching to the node where the job is currently running and use nvidia-smi
to read the GPU metrics in real time. This will not provide maximum and average values for memory and power of the full job, but may be helpful to troubleshoot jobs.