Multi-Instance GPU
Many programs are unable to fully use modern GPUs such as NVidia A100s and H100s. Multi-Instance GPU (MIG) is a technology that allows partitioning a single GPU into multiple instances, making each one a completely independent GPU. Each of the GPU's instances gets a slice of the full GPU's computational resources and memory, all detached from the other instances by on-chip protections.
Using instances of a GPU is less wasteful and usage is billed accordingly. Jobs submitted on such instances use less of your allocated priority compared to a full GPU; you will then be able to execute more jobs and have shorter wait time.
Choosing between a full GPU and an instance of a GPU
Jobs that use less than half of the computing power of a full GPU and less than half of the available memory should be evaluated and tested on an instance. In most cases, these jobs will run just as fast and consume less than half of the computing resource.
See section Finding which of your jobs should use a GPU instance for more details.
Limitations
The MIG technology does not support CUDA Inter-Process Communication (IPC), which optimizes data transfers between GPUs over NVLink and NVSwitch. This limitation also prevents efficient communication between instances. Consequently, launching an executable on more than one instance at a time does not improve performance and should be avoided.
GPU jobs requiring many CPU cores may also require a full GPU instead of an instance. The maximum number of CPU cores per instance depends on the number of cores per full GPU and on the configured MIG profiles. Both factors may vary between clusters and also between GPU nodes in a cluster.
Available configurations
As of August 23, 2024, 20% of the Narval A100 nodes offer GPU instances. While there exist many possible MIG configurations and profiles, only the two following profiles have been implemented:
3g.20gb
4g.20gb
The profile name describes the size of the instance.
For example, a 3g.20gb
instance has 20 GB of RAM and offers 3/8 of the computing performance of a full A100-40gb GPU. Using less powerful profiles will have a lower impact on your allocation and priority.
On Narval, the recommended maximum number of cores and amount of system memory per instance are:
3g.20gb
: maximum 6 cores and 62GB4g.20gb
: maximum 6 cores and 62GB
To request an instance of a certain profile, your job submission must include a --gres
parameter:
3g.20gb
:--gres=gpu:a100_3g.20gb:1
4g.20gb
:--gres=gpu:a100_4g.20gb:1
Note: For the job scheduler on Narval, the prefix a100_
is required before the profile name.
Job examples
- Request an instance of power 3/8 and size 20GB for a 1-hour interactive job:
[name@server ~]$ salloc --account=def-someuser --gres=gpu:a100_3g.20gb:1 --cpus-per-task=2 --mem=40gb --time=1:0:0
- Request an instance of power 4/8 and size 20GB for a 24-hour batch job using the maximum recommended number of cores and system memory:
#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --gres=gpu:a100_4g.20gb:1
#SBATCH --cpus-per-task=6 # There are 6 CPU cores per 3g.20gb and 4g.20gb on Narval.
#SBATCH --mem=62gb # There are 62GB GPU RAM per 3g.20gb and 4g.20gb on Narval.
#SBATCH --time=24:00:00
hostname
nvidia-smi
Finding which of your jobs should use an instance
You can find information on current and past jobs on the Narval usage portal, under the Job stats tab.
Power consumption is a good indicator of the total computing power requested from the GPU. For instance, the following job requested a full A100 GPU with a maximum TDP of 400W, but only used 100W on average, which is only 50W more than the idle electric consumption:
GPU functionality utilization may also provide insights on the usage of the GPU in cases where the power consumption is not sufficient. For this example job, GPU utilization graph supports the conclusion of the GPU power consumption graph that the job uses less than 25% of the available computing power of a full A100 GPU:
The final metrics to consider are the maximum amount of GPU memory and the average number of CPU cores required to run the job. For this example, the job uses a maximum of 3GB of GPU memory out of the 40GB of a full A100 GPU.
It was also launched using a single CPU core. When taking into account these three last metrics, we can confirm that the job should easily run on a 3g.20GB or 4g.20GB GPU instance with power and memory to spare.
Another way to monitor the usage of a running job is by attaching to the node where the job is currently running and then by using nvidia-smi
to read the GPU metrics in real time.
This will not provide maximum and average values for memory and power usage of the entire job, but it may be helpful to identify and troubleshoot underperforming jobs.