Multi-Instance GPU: Difference between revisions

Reviewed Available configurations
(Reviewed the limitations, added links)
(Reviewed Available configurations)
Line 21: Line 21:


= Available configurations = <!--T:6-->
= Available configurations = <!--T:6-->
As of July 30 2024, MIGs are available at this time on the Narval cluster with the A100s GPUs. While there are many possible configurations for MIGs, the following configurations are currently activated on Narval:
As of July 30, 2024, only Narval has a few A100 nodes configured with MIG.
* MIG 3g.20gb
While there exist [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#a100-profiles many possible MIG configurations and profiles], only the two following profiles have been implemented on selected GPUs:
* MIG 4g.20gb
* <code>3g.20gb</code>
* <code>4g.20gb</code>


<!--T:7-->
<!--T:7-->
The name describes the size of the MIG: 3g.20gb has 20GB of GPU RAM and offers 3/8 of the computing performance of a full A100 GPU. Using less powerful MIGs will have a lower impact on your allocation and priority.
The profile name describes the size of the GPU instance.
For example, a <code>3g.20gb</code> instance has 20 GB of GPU RAM and offers 3/8 of the computing performance of a full A100-40gb GPU. Using less powerful MIG profiles will have a lower impact on your allocation and priority.


<!--T:8-->
<!--T:8-->
The recommended maximum number of cores per MIG on Narval are:
On Narval, the recommended maximum number of cores and amount of system memory per GPU instance are:
* 3g.20gb: maximum 6 cores
* <code>3g.20gb</code>: maximum 6 cores and 62 GB
* 4g.20gb: maximum 6 cores
* <code>4g.20gb</code>: maximum 6 cores and 62 GB


<!--T:9-->
<!--T:9-->
To request a MIG of certain flavor, you must include this line in your job submission script:
To request a GPU instance of a certain profile, your job submission must include a <code>--gres</code> parameter:
  --gres=gpu:a100_3g.20gb:1
* <code>3g.20gb</code>: <code>--gres=gpu:a100_3g.20gb:1</code>
  --gres=gpu:a100_4g.20gb:1
* <code>4g.20gb</code>: <code>--gres=gpu:a100_4g.20gb:1</code> 
 
Note: for the job scheduler on Narval, the prefix <code>a100_</code> is required before the profile name.


= Examples = <!--T:10-->
= Examples = <!--T:10-->
cc_staff
782

edits