Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
mNo edit summary |
(update to more recent OS versions) Tag: Reverted |
||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
This page describes how to | |||
This | * allocate virtual GPU (vGPU) resources to a virtual machine (VM), | ||
* install the necessary drivers and | |||
* check whether the vGPU can be used. | |||
Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]]. | |||
If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package. | |||
== Supported flavors == | == Supported flavors == | ||
<!--T:3--> | <!--T:3--> | ||
To use a | To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus. | ||
<!--T:4--> | <!--T:4--> | ||
* | * g1-8gb-c4-22gb | ||
* g1- | * g1-16gb-c8-40gb | ||
== | == Preparation of a VM running AlmaLinux 9 == | ||
Once the VM is available, make sure to update the OS to the latest available software, including the kernel. | |||
Then, reboot the VM to have the latest kernel running. | |||
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required. | |||
< | AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted. | ||
The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots. | |||
< | </translate> | ||
<pre> | <pre> | ||
[root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf | |||
[root@almalinux9]# dracut -fv --omit-drivers nouveau | |||
[root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot | |||
</pre> | </pre> | ||
<translate> | |||
===== | After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed. | ||
</translate> | |||
<pre> | |||
[root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre> | |||
<translate> | |||
The next step is to install the vGPU packages, which will install the required driver and user-space tools. | |||
</translate> | |||
<pre> | |||
[root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64 | |||
</pre> | |||
<translate> | |||
After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality. | |||
</translate> | |||
<pre> | |||
[root@almalinux9]# nvidia-smi | |||
Tue Apr 23 16:37:31 2024 | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | |||
<translate> | |||
== Preparation of a VM running AlmaLinux 8 == | |||
Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running. | |||
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required. | |||
</translate> | |||
<pre> | |||
[root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot | |||
</pre> | |||
<translate> | |||
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed. | |||
</translate> | |||
<pre> | |||
[root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm | |||
</pre> | |||
<translate> | |||
The next step is to install the vGPU packages, which will install the required driver and user-space tools. | |||
</translate> | |||
<pre> | |||
[root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64 | |||
</pre> | |||
<translate> | |||
After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality. | |||
</translate> | |||
<pre> | |||
[root@almalinux8]# nvidia-smi | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | |||
<translate> | |||
== Preparation of a VM running Debian 11 == | |||
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories. | |||
</translate> | |||
<pre> | |||
root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot | |||
</pre> | |||
<translate> | |||
After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package. | |||
This package also contains the gpg key all packages are signed with. | |||
< | </translate> | ||
<pre> | <pre> | ||
root@ | root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb | ||
root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb | |||
</pre> | </pre> | ||
<translate> | |||
Update the local apt cache and install the vGPU packages: | |||
< | </translate> | ||
<pre> | <pre> | ||
root@ | root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | ||
</pre> | </pre> | ||
<pre> | <pre> | ||
root@ | root@debian11:~# nvidia-smi | ||
Tue Apr 23 18:55:18 2024 | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | </pre> | ||
<translate> | |||
== Preparation of a VM running Debian 12 == | |||
Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories. | |||
</translate> | |||
<pre> | <pre> | ||
root@ | root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot | ||
</pre> | </pre> | ||
<translate> | |||
< | After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package. | ||
This package also contains the gpg key all packages are signed with. | |||
</translate> | |||
<pre> | |||
root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb | |||
root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb | |||
</pre> | |||
<translate> | |||
Update the local apt cache and install the vGPU packages: | |||
< | </translate> | ||
<pre> | |||
root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | |||
< | </pre> | ||
<pre> | |||
root@debian12:~# nvidia-smi | |||
Tue Apr 23 18:55:18 2024 | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | |||
<translate> | |||
==== | == Preparation of a VM running Ubuntu 22 == | ||
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running. | |||
< | </translate> | ||
<pre> | <pre> | ||
root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot | |||
</pre> | </pre> | ||
<translate> | |||
After a successful reboot, the system should have the latest available kernel running. | |||
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package. | |||
This package also contains the gpg key all packages are signed with. | |||
</translate> | |||
<pre> | <pre> | ||
root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb | |||
root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb | |||
</pre> | </pre> | ||
<translate> | |||
Update the local apt cache and install the vGPU packages: | |||
</translate> | |||
<pre> | <pre> | ||
root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | |||
</pre> | |||
<translate> | |||
If your installation was successful, the vGPU will be accessible and licensed. | |||
</pre> | </translate> | ||
<pre> | |||
root@ubuntu22:~# nvidia-smi | |||
Wed Apr 24 14:37:52 2024 | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | |||
<translate> | |||
== Preparation of a VM running Ubuntu 20 == | |||
Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running. | |||
</translate> | |||
<pre> | |||
root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot | |||
</pre> | |||
<translate> | |||
After a successful reboot, the system should have the latest available kernel running. | |||
Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package. | |||
This package also contains the gpg key all packages are signed with. | |||
</translate> | |||
<pre> | |||
root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb | |||
root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb | |||
</pre> | |||
<translate> | |||
Update the local apt cache and install the vGPU packages: | |||
</translate> | |||
<pre> | <pre> | ||
root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | |||
</pre> | </pre> | ||
<translate> | |||
If your installation was successful, the vGPU will be accessible and licensed. | |||
</translate> | </translate> | ||
<pre> | |||
root@ubuntu20:~# nvidia-smi | |||
Wed Apr 24 14:37:52 2024 | |||
+-----------------------------------------------------------------------------------------+ | |||
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |||
|-----------------------------------------+------------------------+----------------------+ | |||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |||
| | | MIG M. | | |||
|=========================================+========================+======================| | |||
| 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 | | |||
| N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | |||
| | | N/A | | |||
+-----------------------------------------+------------------------+----------------------+ | |||
+-----------------------------------------------------------------------------------------+ | |||
| Processes: | | |||
| GPU GI CI PID Type Process name GPU Memory | | |||
| ID ID Usage | | |||
|=========================================================================================| | |||
| No running processes found | | |||
+-----------------------------------------------------------------------------------------+ | |||
</pre> | |||
[[Category:Cloud]] | [[Category:Cloud]] |