Multi-Instance GPU: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
<languages />
<languages />
<translate>  
<translate>  
= Introduction = <!--T:1-->
<!--T:1-->
Many programs are unable to fully use modern GPUs such as NVidia [https://www.nvidia.com/en-us/data-center/a100/ A100] and [https://www.nvidia.com/en-us/data-center/h100/ H100].
Many programs are unable to fully use modern GPUs such as NVidia [https://www.nvidia.com/en-us/data-center/a100/ A100s] and [https://www.nvidia.com/en-us/data-center/h100/ H100s].
[https://www.nvidia.com/en-us/technologies/multi-instance-gpu/ Multi-Instance GPU (MIG)] is a technology that allows partitioning a single GPU into multiple [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#terminology GPU instances], thus making each ''instance'' a completely independent GPU.
[https://www.nvidia.com/en-us/technologies/multi-instance-gpu/ Multi-Instance GPU (MIG)] is a technology that allows partitioning a single GPU into multiple [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#terminology instances], making each one a completely independent GPU.
Each of the multiple GPU instances would then have a certain slice of the GPU's computational resources and memory, all detached from the other instances by on-chip protections.
Each of the GPU's instances gets a slice of the GPU's computational resources and memory, all detached from the other instances by on-chip protections.


<!--T:2-->
<!--T:2-->
GPU instances can be less wasteful and their usage is billed accordingly. Jobs submitted on one of those instances will use less of your allocated priority compared to a full GPU. You will be able to execute more jobs and have shorter wait time.
Using instances of a GPU instances is less wasteful and usage is billed accordingly. Jobs submitted on such instances use less of your allocated priority compared to a full GPU; you will than be able to execute more jobs and have shorter wait time.


== Which jobs should use GPU instances instead of full GPUs? == <!--T:3-->
= Which jobs should use GPU instances instead of full GPUs? = <!--T:3-->
Jobs that use less than half of the computing power of a GPU and less than half of the available GPU memory should be evaluated and tested on a GPU instance. In most cases, these jobs will run just as fast on a GPU instance and consume less than half of the computing resource.
Jobs that use less than half of the computing power of a GPU and less than half of the available GPU memory should be evaluated and tested on a GPU instance. In most cases, these jobs will run just as fast on a GPU instance and consume less than half of the computing resource.


Line 15: Line 15:
See the section [[#Finding_which_of_your_jobs_should_use_a_GPU_instance|Finding which of your jobs should use a GPU instance]] for more details.
See the section [[#Finding_which_of_your_jobs_should_use_a_GPU_instance|Finding which of your jobs should use a GPU instance]] for more details.


== Limitations == <!--T:4-->
=Limitations = <!--T:4-->
[https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#app-considerations GPU instances do not support] the [https://developer.nvidia.com/docs/drive/drive-os/6.0.8.1/public/drive-os-linux-sdk/common/topics/nvsci_nvsciipc/Inter-ProcessCommunication1.html CUDA Inter-Process Communication (IPC)], which optimises data transfers between GPUs over NVLink and NVSwitch.
[https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#app-considerations GPU instances do not support] the [https://developer.nvidia.com/docs/drive/drive-os/6.0.8.1/public/drive-os-linux-sdk/common/topics/nvsci_nvsciipc/Inter-ProcessCommunication1.html CUDA Inter-Process Communication (IPC)], which optimises data transfers between GPUs over NVLink and NVSwitch.
This limitation also affects communications between GPU instances in a single GPU.
This limitation also affects communications between GPU instances in a single GPU.
rsnt_translations
56,430

edits