Using cloud GPUs: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Marked this version for translation)
Tag: Reverted
(Undo revision 163552 by Rdickson (talk))
Tag: Undo
Line 2: Line 2:
<translate>
<translate>


<!--T:31-->
This page describes how to  
This page describes how to  
* allocate virtual GPU (vGPU) resources to a virtual machine (VM),  
* allocate virtual GPU (vGPU) resources to a virtual machine (VM),  
Line 10: Line 9:
If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.


== Supported flavors == <!--T:32-->
== Supported flavors ==


<!--T:3-->
<!--T:3-->
Line 19: Line 18:
* g1-16gb-c8-40gb
* g1-16gb-c8-40gb


== Preparation of a VM running AlmaLinux 9 == <!--T:33-->
== Preparation of a VM running AlmaLinux 9 ==  


<!--T:34-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
Then, reboot the VM to have the latest kernel running.
Then, reboot the VM to have the latest kernel running.


<!--T:35-->
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.


<!--T:36-->
AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.  
AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.  
The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
Line 40: Line 36:
<translate>
<translate>


<!--T:37-->
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.  
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.  


Line 48: Line 43:
<translate>
<translate>


<!--T:38-->
The next step is to install the vGPU packages, which will install the required driver and user-space tools.
The next step is to install the vGPU packages, which will install the required driver and user-space tools.


Line 57: Line 51:
<translate>
<translate>


<!--T:39-->
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.


Line 86: Line 79:
<translate>
<translate>


== Preparation of a VM running AlmaLinux 8 == <!--T:40-->
== Preparation of a VM running AlmaLinux 8 ==  


<!--T:41-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
Line 98: Line 90:
<translate>
<translate>


<!--T:42-->
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.


Line 107: Line 98:
<translate>
<translate>


<!--T:43-->
The next step is to install the vGPU packages, which will install the required driver and user-space tools.
The next step is to install the vGPU packages, which will install the required driver and user-space tools.
</translate>
</translate>
Line 115: Line 105:
<translate>
<translate>


<!--T:44-->
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
After a successful  installation, <code>nvidia-smi</code> can be used to verify the proper functionality.


Line 143: Line 132:
<translate>
<translate>


== Preparation of a VM running Debian 11 == <!--T:45-->
== Preparation of a VM running Debian 11 ==
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.


Line 152: Line 141:
<translate>
<translate>


<!--T:46-->
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.
This package also contains the gpg key all packages are signed with.
Line 163: Line 151:
<translate>
<translate>


<!--T:47-->
Update the local apt cache and install the vGPU packages:
Update the local apt cache and install the vGPU packages:


Line 196: Line 183:
<translate>
<translate>


== Preparation of a VM running Debian 12 == <!--T:48-->
== Preparation of a VM running Debian 12 ==  
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.


Line 205: Line 192:
<translate>
<translate>


<!--T:49-->
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
This package also contains the gpg key all packages are signed with.
This package also contains the gpg key all packages are signed with.
Line 216: Line 202:
<translate>
<translate>


<!--T:50-->
Update the local apt cache and install the vGPU packages:
Update the local apt cache and install the vGPU packages:


Line 249: Line 234:
<translate>
<translate>


== Preparation of a VM running Ubuntu 22 == <!--T:51-->
== Preparation of a VM running Ubuntu 22 ==  
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.


Line 258: Line 243:
<translate>
<translate>


<!--T:52-->
After a successful reboot, the system should have the latest available kernel running.  
After a successful reboot, the system should have the latest available kernel running.  
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
Line 270: Line 254:
<translate>
<translate>


<!--T:53-->
Update the local apt cache and install the vGPU packages:
Update the local apt cache and install the vGPU packages:
</translate>
</translate>
Line 278: Line 261:
<translate>
<translate>


<!--T:54-->
If your installation was successful, the vGPU will be accessible and licensed.
If your installation was successful, the vGPU will be accessible and licensed.


Line 307: Line 289:
<translate>
<translate>


== Preparation of a VM running Ubuntu 20 == <!--T:55-->
== Preparation of a VM running Ubuntu 20 ==
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.


Line 316: Line 298:
<translate>
<translate>


<!--T:56-->
After a successful reboot, the system should have the latest available kernel running.  
After a successful reboot, the system should have the latest available kernel running.  
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
Line 328: Line 309:
<translate>
<translate>


<!--T:57-->
Update the local apt cache and install the vGPU packages:
Update the local apt cache and install the vGPU packages:
</translate>
</translate>
Line 336: Line 316:
<translate>
<translate>


<!--T:58-->
If your installation was successful, the vGPU will be accessible and licensed.
If your installation was successful, the vGPU will be accessible and licensed.



Revision as of 18:40, 29 October 2024

Other languages:

This page describes how to

  • allocate virtual GPU (vGPU) resources to a virtual machine (VM),
  • install the necessary drivers and
  • check whether the vGPU can be used.

Access to repositories as well as to the vGPUs is currently only available within Arbutus Cloud. Please note that the documentation below only covers the vGPU driver installation. The CUDA toolkit is not pre-installed but you can install it directly from NVIDIA or load it from the CVMFS software stack. If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.

Supported flavors

To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus.

  • g1-8gb-c4-22gb
  • g1-16gb-c8-40gb

Preparation of a VM running AlmaLinux 9

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.

To have access to the DKMS package, the EPEL repository is required.

AlmaLinux 9 has by default a faulty nouveau driver which crashes the kernel as soon as the nvidia driver is mounted. The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.

[root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
[root@almalinux9]# dracut -fv --omit-drivers nouveau
[root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot

After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.

[root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm

The next step is to install the vGPU packages, which will install the required driver and user-space tools.

[root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64

After a successful installation, nvidia-smi can be used to verify the proper functionality.

[root@almalinux9]# nvidia-smi 
Tue Apr 23 16:37:31 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running AlmaLinux 8

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running. To have access to the DKMS package, the EPEL repository is required.

[root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot

After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.

[root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm

The next step is to install the vGPU packages, which will install the required driver and user-space tools.

[root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64

After a successful installation, nvidia-smi can be used to verify the proper functionality.

[root@almalinux8]# nvidia-smi 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Debian 11

Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as DKMS will request the latest one available from the Debian repositories.

root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb
root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb

Update the local apt cache and install the vGPU packages:

root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@debian11:~# nvidia-smi
Tue Apr 23 18:55:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Debian 12

Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as DKMS will request the latest one available from the Debian repositories.

root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb

Update the local apt cache and install the vGPU packages:

root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
root@debian12:~# nvidia-smi
Tue Apr 23 18:55:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Ubuntu 22

Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.

root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb

Update the local apt cache and install the vGPU packages:

root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed.

root@ubuntu22:~# nvidia-smi 
Wed Apr 24 14:37:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Preparation of a VM running Ubuntu 20

Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.

root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb

Update the local apt cache and install the vGPU packages:

root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed.

root@ubuntu20:~# nvidia-smi 
Wed Apr 24 14:37:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:06.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+