Multi-Instance GPU: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 4: Line 4:
Many programs are unable to fully use modern GPUs such as NVidia [https://www.nvidia.com/en-us/data-center/a100/ A100s] and [https://www.nvidia.com/en-us/data-center/h100/ H100s].
Many programs are unable to fully use modern GPUs such as NVidia [https://www.nvidia.com/en-us/data-center/a100/ A100s] and [https://www.nvidia.com/en-us/data-center/h100/ H100s].
[https://www.nvidia.com/en-us/technologies/multi-instance-gpu/ Multi-Instance GPU (MIG)] is a technology that allows partitioning a single GPU into multiple [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#terminology instances], making each one a completely independent virtual GPU.
[https://www.nvidia.com/en-us/technologies/multi-instance-gpu/ Multi-Instance GPU (MIG)] is a technology that allows partitioning a single GPU into multiple [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#terminology instances], making each one a completely independent virtual GPU.
Each of the GPU's instances gets a slice of the original GPU's computational resources and memory, all detached from the other instances by on-chip protections.
Each of the GPU instances gets a portion of the original GPU's computational resources and memory, all detached from the other instances by on-chip protections.


<!--T:2-->
<!--T:2-->
Using instances of a GPU is less wasteful and usage is billed accordingly. Jobs submitted on such instances use less of your allocated priority compared to a full GPU; you will then be able to execute more jobs and have shorter wait time.
Using GPU instances is less wasteful, and usage is billed accordingly. Jobs submitted on such instances use less of your allocated priority compared to a full GPU; you will then be able to execute more jobs and have shorter wait time.


= Choosing between a full GPU and an instance of a GPU = <!--T:3-->
= Choosing between a full GPU and a GPU instance = <!--T:3-->
Jobs that use less than half of the computing power of a full GPU and less than half of the available memory should be evaluated and tested on an instance. In most cases, these jobs will run just as fast and consume less than half of the computing resource.
Jobs that use less than half of the computing power of a full GPU and less than half of the available memory should be evaluated and tested on an instance. In most cases, these jobs will run just as fast and consume less than half of the computing resource.


rsnt_translations
56,430

edits

Navigation menu