|
|
Line 1: |
Line 1: |
| | |
| <languages /> | | <languages /> |
| <translate> | | <translate> |
|
| |
|
| This page describes how to | | <!--T:2--> |
| * allocate virtual GPU (vGPU) resources to a virtual machine (VM),
| | This guide describes how to allocate GPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the GPU can be used. |
| * install the necessary drivers and
| |
| * check whether the vGPU can be used.
| |
| Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]].
| |
| If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
| |
|
| |
|
| == Supported flavors == | | == Supported flavors == <!--T:23--> |
|
| |
|
| <!--T:3--> | | <!--T:3--> |
| To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus. | | To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below. The GPU will be available to the operating system via the PCI bus. |
|
| |
|
| <!--T:4--> | | <!--T:4--> |
| * g1-8gb-c4-22gb | | * g2-c24-112gb-500 |
| * g1-16gb-c8-40gb | | * g1-c14-56gb-500 |
| | * g1-c14-56gb |
|
| |
|
| == Preparation of a VM running AlmaLinux 9 == | | == Preparing a Debian 10 instance == <!--T:5--> |
|
| |
|
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
| | <!--T:24--> |
| Then, reboot the VM to have the latest kernel running.
| | To use the GPU via the PCI bus, the proprietary NVIDIA drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only. |
|
| |
|
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
| | ===== Enable the non-free pool ===== <!--T:6--> |
|
| |
|
| AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.
| | <!--T:25--> |
| The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
| | Log in using ssh and add the lines below to ''/etc/apt/sources.list'', if they are not already there. |
|
| |
|
| </translate> | | <!--T:7--> |
| <pre> | | <pre> |
| [root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
| | deb http://deb.debian.org/debian buster main contrib non-free |
| [root@almalinux9]# dracut -fv --omit-drivers nouveau
| | deb http://security.debian.org/ buster/updates main contrib non-free |
| [root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
| | deb http://deb.debian.org/debian buster-updates main contrib non-free |
| </pre> | | </pre> |
| <translate>
| |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | ===== Install the NVIDIA driver ===== <!--T:8--> |
|
| |
|
| </translate> | | <!--T:26--> |
| <pre> | | The following command: |
| [root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
| | * updates the <code>apt</code> cache, so that <code>apt</code> will be aware of the new software pool sections, |
| <translate> | | * updates the OS to the latest software versions, and |
| | * installs kernel headers, an NVIDIA driver, and <code>pciutils</code>, which will be required to list the devices connected to the PCI bus. |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | <!--T:9--> |
| | |
| </translate>
| |
| <pre> | | <pre> |
| [root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver |
| </pre> | | </pre> |
| <translate>
| |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | <!--T:27--> |
| | If this command finishes successfully, the NVIDIA driver will have been compiled and loaded. |
|
| |
|
| </translate> | | <!--T:10--> |
| | * Check if the GPU is exposed on the PCI bus |
| <pre> | | <pre> |
| [root@almalinux9]# nvidia-smi
| | root@gpu2:~# lspci -vk |
| Tue Apr 23 16:37:31 2024
| | [...] |
| +-----------------------------------------------------------------------------------------+
| | 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Subsystem: NVIDIA Corporation GK210GL [Tesla K80] |
| |-----------------------------------------+------------------------+----------------------+
| | Physical Slot: 5 |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Flags: bus master, fast devsel, latency 0, IRQ 11 |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | Memory at fd000000 (32-bit, non-prefetchable) [size=16M] |
| | | | MIG M. |
| | Memory at 1000000000 (64-bit, prefetchable) [size=16G] |
| |=========================================+========================+======================|
| | Memory at 1400000000 (64-bit, prefetchable) [size=32M] |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| | Capabilities: [60] Power Management version 3 |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| | Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ |
| | | | N/A |
| | Capabilities: [78] Express Endpoint, MSI 00 |
| +-----------------------------------------+------------------------+----------------------+
| | Kernel driver in use: nvidia |
|
| | Kernel modules: nvidia |
| +-----------------------------------------------------------------------------------------+
| | [...] |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre> | | </pre> |
| <translate>
| |
|
| |
|
| == Preparation of a VM running AlmaLinux 8 ==
| | <!--T:11--> |
| | | * Check that the <code>nvidia</code> kernel module is loaded |
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
| |
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
| |
| | |
| </translate> | |
| <pre> | | <pre> |
| [root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot
| | root@gpu2:~# lsmod | grep nvidia |
| | nvidia 17936384 0 |
| | nvidia_drm 16384 0 |
| </pre> | | </pre> |
| <translate>
| |
|
| |
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| |
|
| |
|
| </translate> | | <!--T:12--> |
| | * Start <code>nvidia-persistenced</code>, which will create the necessary device files and make the GPU accessible in user space. |
| <pre> | | <pre> |
| [root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm
| | root@gpu2:~# systemctl restart nvidia-persistenced |
| </pre> | | root@gpu2:~# ls -al /dev/nvidia* |
| <translate>
| | crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 |
| | crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl |
| | crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset |
| | </pre> |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | <!--T:14--> |
| </translate>
| | The GPU is now available within the user space and can be used. |
| <pre> | |
| [root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| |
| </pre>
| |
| <translate>
| |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | == Preparing a CentOS 7 instance == <!--T:15--> |
|
| |
|
| </translate> | | <!--T:28--> |
| <pre>
| | NVIDIA provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories. |
| [root@almalinux8]# nvidia-smi
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| <translate>
| |
| | |
| == Preparation of a VM running Debian 11 ==
| |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| |
|
| |
|
| </translate> | | <!--T:16--> |
| <pre> | | To compile the module sources from the NVIDIA repository, it is necessary to install <code>dkms</code>. |
| root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot
| | This will automatically build the modules on kernel updates, and therefore ensures that the GPU is still working after any update of the OS. |
| </pre> | | <code>dkms</code> is provided in the EPEL repository. |
| <translate>
| | Kernel headers and the kernel source need to be installed before the NVIDIA driver can be set up. |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | ===== Enable the EPEL repository and install needed software ===== <!--T:17--> |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
|
| </translate> | | <!--T:29--> |
| <pre> | | <pre> |
| root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb | | [root@gpu-centos centos]# yum -y update && reboot |
| root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb
| | yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r) |
| </pre> | | </pre> |
| <translate>
| |
|
| |
|
| Update the local apt cache and install the vGPU packages:
| | ===== Add the NVIDIA repository and install the driver package ===== <!--T:18--> |
| | |
| </translate> | |
| <pre>
| |
| root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
|
| |
|
| | <!--T:30--> |
| | Install the <code>yum</code> repository: |
| <pre> | | <pre> |
| root@debian11:~# nvidia-smi | | [root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo |
| Tue Apr 23 18:55:18 2024
| | yum install -y cuda-drivers |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre> | | </pre> |
| <translate>
| |
|
| |
| == Preparation of a VM running Debian 12 ==
| |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| |
|
| |
|
| </translate> | | <!--T:19--> |
| | NVIDIA uses its own GPG key to sign its packages. <code>yum</code> will ask to autoimport it. Reply "y" for "yes" when prompted. |
| <pre> | | <pre> |
| root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot
| | Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| | Importing GPG key 0x7FA2AF80: |
| | Userid : "cudatools <cudatools@nvidia.com>" |
| | Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80 |
| | From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| | Is this ok [y/N]: y |
| </pre> | | </pre> |
| <translate>
| |
|
| |
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
| |
| root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb
| |
| </pre>
| |
| <translate>
| |
|
| |
| Update the local apt cache and install the vGPU packages:
| |
|
| |
|
| </translate> | | <!--T:21--> |
| | After installation, reboot the VM to properly load the module and create the NVIDIA device files. |
| <pre> | | <pre> |
| root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos ~]# ls -al /dev/nvidia* |
| | crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0 |
| | crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl |
| | crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset |
| | crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm |
| | crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools |
| </pre> | | </pre> |
|
| |
|
| <pre> | | <!--T:22--> |
| root@debian12:~# nvidia-smi
| | The GPU is now accessible via any user space tool. |
| Tue Apr 23 18:55:18 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| <translate>
| |
| | |
| == Preparation of a VM running Ubuntu 22 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
|
| |
|
| </translate> | | </translate> |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| <translate>
| |
|
| |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
| |
| root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb
| |
| </pre>
| |
| <translate>
| |
|
| |
| Update the local apt cache and install the vGPU packages:
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| <translate>
| |
|
| |
| If your installation was successful, the vGPU will be accessible and licensed.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu22:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| <translate>
| |
|
| |
| == Preparation of a VM running Ubuntu 20 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| <translate>
| |
|
| |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| </pre>
| |
| <translate>
| |
|
| |
| Update the local apt cache and install the vGPU packages:
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| <translate>
| |
|
| |
| If your installation was successful, the vGPU will be accessible and licensed.
| |
|
| |
| </translate>
| |
| <pre>
| |
| root@ubuntu20:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
| [[Category:Cloud]] | | [[Category:Cloud]] |