Using cloud vGPUs: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(install epel via standard CentOS package)
(Marked this version for translation)
 
(95 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{Draft}}
<languages />
<languages />
<translate>
<translate>
<!--T:2-->
This guide describes how to allocate vGPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the vGPU can be used. Repository access as well as access to the vGPUs, is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. To minimize the efforts in using vGPUs, users are recommending to rely on our [https://docs.computecanada.ca/wiki/Accessing_CVMFS CVMFS software stack] as much as possible, for example for accessing and using CUDA. 


== Supported flavors == <!--T:23-->
<!--T:84-->
This page describes how to
* allocate virtual GPU (vGPU) resources to a virtual machine (VM),
* install the necessary drivers and
* check whether the vGPU can be used.
Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from  NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]].
If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
 
== Supported flavors == <!--T:85-->


<!--T:3-->
<!--T:3-->
To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus. While finalizing the setup for more vGPU profiles, the only flavor accessible right now is:
To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus.


<!--T:4-->
<!--T:4-->
* vgpu1-c18-56gb
* g1-8gb-c4-22gb
* g1-16gb-c8-40gb


== Preparation of a VM running CentOS7 == <!--T:5-->
== Preparation of a VM running AlmaLinux 9 == <!--T:86-->  


Once the VM is available, make sure to update the OS to the latest available software, including the kernel and reboot the VM to have the latest kernel running.
<!--T:87-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
Then, reboot the VM to have the latest kernel running.
 
<!--T:88-->
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
 
<!--T:89-->
AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.
The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
 
</translate>
<pre>
<pre>
[root@centos7]# yum -y update && reboot
[root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
[root@almalinux9]# dracut -fv --omit-drivers nouveau
[root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
</pre>
</pre>
<translate>


Since the proprietary nvidia drivers need to be compiled against the running kernel, the package '''dkms''' is required from the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL Repository]
<!--T:90-->
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.  


</translate>
<pre>
<pre>
[root@centos7]# yum -y install epel-release
[root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
</pre>
<translate>
 
<!--T:91-->
The next step is to install the vGPU packages, which will install the required driver and user-space tools.


Install the '''Arbutus Cloud''' [http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el7.noarch.rpm repository], it also installs the public key the package are signed with to ensure their authenticity, since these drivers and
</translate>
userspace tools are carefully tested first against the infrastructure, before they are made available.
<pre>
<pre>
[root@centos7]# yum -y install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el7.noarch.rpm
[root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
</pre>
</pre>
<translate>


The last step is to install the '''nvidia vGPU packages'''. The kernel module package 'nvidia-vgpu-kmod', will take a few minutes as it compiles the required kernel modules in the background.
<!--T:92-->
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
 
</translate>
<pre>
<pre>
[root@centos7]# yum -y install nvidia-vgpu-kmod nvidia-vgpu-gridd nvidia-vgpu-tools
[root@almalinux9]# nvidia-smi
Tue Apr 23 16:37:31 2024     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
</pre>
<translate>


After the successful installation, the vGPU is a now accessible and licensed.
== Preparation of a VM running AlmaLinux 8 == <!--T:93-->
 
<!--T:94-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
 
</translate>
<pre>
<pre>
[root@centos7]# nvidia-smi       
[root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot
Mon Jun  1 16:03:27 2020     
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.56      Driver Version: 440.56      CUDA Version: 10.2    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|        Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|  0  GRID V100D-8C      On  | 00000000:00:05.0 Off |                    0 |
| N/A  N/A    P0    N/A /  N/A |    560MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                 
+-----------------------------------------------------------------------------+
| Processes:                                                      GPU Memory |
|  GPU      PID  Type  Process name                            Usage      |
|=============================================================================|
|  No running processes found                                                |
+-----------------------------------------------------------------------------+
</pre>
</pre>
<translate>


To check for the license status as well as other information for the vGPU.
<!--T:95-->
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.


</translate>
<pre>
<pre>
[root@centos7]# nvidia-smi -q |less
[root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm
==============NVSMI LOG==============
</pre>
<translate>


Timestamp                          : Mon Jun  1 16:06:59 2020
<!--T:96-->
Driver Version                      : 440.56
The next step is to install the vGPU packages, which will install the required driver and user-space tools.
CUDA Version                        : 10.2
</translate>
<pre>
[root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
</pre>
<translate>


Attached GPUs                      : 1
<!--T:97-->
GPU 00000000:00:05.0
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
    Product Name                    : GRID V100D-8C
    Product Brand                  : Grid
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                : Disabled
    Accounting Mode Buffer Size    : 4000
    Driver Model
        Current                    : N/A
        Pending                    : N/A
    Serial Number                  : N/A
    GPU UUID                        : GPU-315b585a-a41e-11ea-a63b-4ed0221b4f99
    Minor Number                    : 0
    VBIOS Version                  : 00.00.00.00.00
    MultiGPU Board                  : No
    Board ID                        : 0x5
    GPU Part Number                : N/A
    Inforom Version
        Image Version              : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object    : N/A
    GPU Operation Mode
        Current                    : N/A
        Pending                    : N/A
    GPU Virtualization Mode
        Virtualization Mode        : VGPU
        Host VGPU Mode              : N/A
    GRID Licensed Product
        Product Name                : NVIDIA vComputeServer
        License Status              : Licensed
    IBMNPU
        Relaxed Ordering Mode      : N/A
    PCI
        Bus                        : 0x00
        Device                      : 0x05
        Domain                      : 0x0000
        Device Id                  : 0x1DB610DE
        Bus Id                      : 00000000:00:05.0
        Sub System Id              : 0x139610DE


</translate>
<pre>
[root@almalinux8]# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
</pre>
<translate>


== Preparation of a VM running CentOS8 == <!--T:6-->
== Preparation of a VM running Debian 11 == <!--T:98-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel and reboot the VM to have the latest kernel running.
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
 
</translate>
<pre>
<pre>
[root@centos8]# dnf -y update && reboot
root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot
</pre>
</pre>
<translate>


Since the proprietary nvidia drivers need to be compiled against the running kernel, the package '''dkms''' is required from the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm EPEL Repository]
<!--T:99-->
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.


</translate>
<pre>
<pre>
[root@centos8]# dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb
root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb
</pre>
</pre>
<translate>


Install the '''Arbutus Cloud''' [http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el8.noarch.rpm repository], it also installs the public key the package are signed with to ensure their authenticity, since these drivers and
<!--T:100-->
userspace tools are carefully tested first against the infrastructure, before they are made available.
Update the local apt cache and install the vGPU packages:
 
</translate>
<pre>
<pre>
[root@centos8]# dnf -y install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el8.noarch.rpm
root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
</pre>
</pre>


The last step is to install the '''nvidia vGPU packages'''. The kernel module package 'nvidia-vgpu-kmod', will take a few minutes as it compiles the required kernel modules in the background.
<pre>
<pre>
[root@centos8]# dnf -y install nvidia-vgpu-kmod nvidia-vgpu-gridd nvidia-vgpu-tools
root@debian11:~# nvidia-smi
Tue Apr 23 18:55:18 2024     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
</pre>
<translate>


After the successful installation, the vGPU is a now accessible and licensed.
== Preparation of a VM running Debian 12 == <!--T:101-->
To check on the status, the same '''nvidia-smi''' commands can be used as seen above for Centos7.
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
 
== Preparation of a VM running Debian10 == <!--T:7-->
Ensure that the latest packagesare installed and the system has been booted the latest stable kernel, as dkms will request the latest one available from the debian repositories.


</translate>
<pre>
<pre>
root@debian10:~# apt-get update && apt-get -y dist-upgrade && reboot
root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot
</pre>  
</pre>  
<translate>


After a successful reboot, the system should have the latest avaible kernel running and the repository can be installed, by installing the repo package.
<!--T:102-->
This package does also contain the gpg key all packages are signed with.
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.


</translate>
<pre>
<pre>
root@debian10:~# apt-get -y install gnupg
root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
root@debian10:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/debian/pool/main/arbutus-cloud-repo_0.1_all.deb
root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb
root@debian10:~# dpkg -i arbutus-cloud-repo_0.1_all.deb
</pre>
</pre>
<translate>
<!--T:103-->
Update the local apt cache and install the vGPU packages:


The installation of the package will display a warning, since the key is directly imported (for convenience) via the packages post installation procedure.
</translate>
<pre>
root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
</pre>


<pre>
<pre>
Setting up arbutus-cloud-repo (0.1) ...
root@debian12:~# nvidia-smi
Warning: apt-key should not be used in scripts (called from postinst maintainerscript of the package arbutus-cloud-repo)
Tue Apr 23 18:55:18 2024     
OK
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
</pre>
<translate>


Update of the local apt cache and installation of the vGPU packages.
== Preparation of a VM running Ubuntu 22 ==  <!--T:104-->
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.


</translate>
<pre>
<pre>
root@debian10:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot
</pre>
</pre>
<translate>


After the successful installation, the vGPU is a now accessible and licensed. To check on the status, the same '''nvidia-smi''' commands can be used as seen above for Centos7.
<!--T:105-->
After a successful reboot, the system should have the latest available kernel running.  
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.


== Preparation of a VM running Ubuntu20 == <!--T:8-->
</translate>
Ensure that the OS is up to date and all the latest patches are installed and the latest stable kernel is running.
<pre>
root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb
</pre>
<translate>


<!--T:106-->
Update the local apt cache and install the vGPU packages:
</translate>
<pre>
root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
</pre>
<translate>
<!--T:107-->
If your installation was successful, the vGPU will be accessible and licensed.
</translate>
<pre>
root@ubuntu22:~# nvidia-smi
Wed Apr 24 14:37:52 2024     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
<translate>
== Preparation of a VM running Ubuntu 20 == <!--T:108-->
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
</translate>
<pre>
<pre>
root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
</pre>
</pre>
<translate>


After a successful reboot, the system should have the latest avaible kernel running and the repository can be installed, by installing the repo package. This package does also contain the gpg key all packages are signed with.  
<!--T:109-->
After a successful reboot, the system should have the latest available kernel running.
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.


</translate>
<pre>
<pre>
root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubuntu/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# dpkg -i arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb
</pre>
</pre>
<translate>


The same warning will be displayed since the signature key is added via post install stage when the package is being installed and can be ignored.
<!--T:110-->
Update of the local apt cache and installation of the vGPU packages.
Update the local apt cache and install the vGPU packages:
</translate>
<pre>
<pre>
root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
</pre>
</pre>
<translate>
<!--T:111-->
If your installation was successful, the vGPU will be accessible and licensed.


After the successful installation, the vGPU is a now accessible and licensed. To check on the status, the same '''nvidia-smi''' commands can be used as seen above for Centos7.
</translate>
</translate>
<pre>
root@ubuntu20:~# nvidia-smi
Wed Apr 24 14:37:52 2024     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
[[Category:Cloud]]

Latest revision as of 18:42, 29 October 2024

Other languages:

This page describes how to

  • allocate virtual GPU (vGPU) resources to a virtual machine (VM),
  • install the necessary drivers and
  • check whether the vGPU can be used.

Access to repositories as well as to the vGPUs is currently only available within Arbutus Cloud. Please note that the documentation below only covers the vGPU driver installation. The CUDA toolkit is not pre-installed but you can install it directly from NVIDIA or load it from the CVMFS software stack. If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.

Supported flavors

To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus.

  • g1-8gb-c4-22gb
  • g1-16gb-c8-40gb

Preparation of a VM running AlmaLinux 9

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.

To have access to the DKMS package, the EPEL repository is required.

AlmaLinux 9 has by default a faulty nouveau driver which crashes the kernel as soon as the nvidia driver is mounted. The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.

[root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
[root@almalinux9]# dracut -fv --omit-drivers nouveau
[root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot

After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.

[root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm

The next step is to install the vGPU packages, which will install the required driver and user-space tools.

[root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64

After a successful installation, nvidia-smi can be used to verify the proper functionality.

[root@almalinux9]# nvidia-smi 
Tue Apr 23 16:37:31 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running AlmaLinux 8

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running. To have access to the DKMS package, the EPEL repository is required.

[root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot

After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.

[root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm

The next step is to install the vGPU packages, which will install the required driver and user-space tools.

[root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64

After a successful installation, nvidia-smi can be used to verify the proper functionality.

[root@almalinux8]# nvidia-smi 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Debian 11

Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as DKMS will request the latest one available from the Debian repositories.

root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb
root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb

Update the local apt cache and install the vGPU packages:

root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@debian11:~# nvidia-smi
Tue Apr 23 18:55:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Debian 12

Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as DKMS will request the latest one available from the Debian repositories.

root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb

Update the local apt cache and install the vGPU packages:

root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@debian12:~# nvidia-smi
Tue Apr 23 18:55:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Ubuntu 22

Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.

root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb

Update the local apt cache and install the vGPU packages:

root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed.

root@ubuntu22:~# nvidia-smi 
Wed Apr 24 14:37:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Ubuntu 20

Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.

root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb

Update the local apt cache and install the vGPU packages:

root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed.

root@ubuntu20:~# nvidia-smi 
Wed Apr 24 14:37:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+