|
|
Line 1: |
Line 1: |
| | |
| <languages /> | | <languages /> |
|
| |
|
| This page describes how to
| | Vous trouverez ici l'information sur comment allouer des ressources GPU à une instance virtuelle (VM ou ''virtual machine''), comment installer les pilotes requis et comment vérifier si le GPU peut être utilisé. |
| * allocate virtual GPU (vGPU) resources to a virtual machine (VM),
| |
| * install the necessary drivers and
| |
| * check whether the vGPU can be used.
| |
| Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]].
| |
| If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
| |
|
| |
|
| == Supported flavors == | | == Gabarits pris en charge == |
|
| |
|
| <div class="mw-translate-fuzzy"> | | <div class="mw-translate-fuzzy"> |
Line 20: |
Line 16: |
| </div> | | </div> |
|
| |
|
| == Preparation of a VM running AlmaLinux 9 == | | == Préparer une instance Debian 10 == |
|
| |
|
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
| | Les pilotes fournis par NVDIA sont nécessaires pour utiliser un GPU via le bus PCI. Comme l'exige Debian, les pilotes doivent être ceux de la section ''non-free''. |
| Then, reboot the VM to have the latest kernel running.
| |
|
| |
|
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
| | ===== Section ''non-free'' ===== |
|
| |
|
| AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.
| | Connectez-vous par SSH et, si elles n'y sont pas, ajoutez à ''/etc/apt/sources.list'' les lignes suivantes : |
| The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
| |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
| | deb http://deb.debian.org/debian buster main contrib non-free |
| [root@almalinux9]# dracut -fv --omit-drivers nouveau
| | deb http://security.debian.org/ buster/updates main contrib non-free |
| [root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
| | deb http://deb.debian.org/debian buster-updates main contrib non-free |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | ===== Installer le pilote NVIDIA ===== |
| | |
| <pre>
| |
| [root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
| |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | La commande suivante |
| | * met à jour la cache <code>apt</code> pour que <code>apt</code> considère les nouvelles sections; |
| | * met à jour le système d'exploitation; |
| | * installe les en-têtes du noyau (''kernel''), un pilote NVIDIA et <code>pciutils</code>, requis pour lister les pilotes connectés au bus PCI. |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver |
| </pre> | | </pre> |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | Si la commande s'effectue sans erreur, le pilote NVIDIA aura été compilé et chargé. |
|
| |
|
| | * Vérifiez si le GPU est visible sur le bus PCI. |
| <pre> | | <pre> |
| [root@almalinux9]# nvidia-smi
| | root@gpu2:~# lspci -vk |
| Tue Apr 23 16:37:31 2024
| | [...] |
| +-----------------------------------------------------------------------------------------+
| | 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Subsystem: NVIDIA Corporation GK210GL [Tesla K80] |
| |-----------------------------------------+------------------------+----------------------+
| | Physical Slot: 5 |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Flags: bus master, fast devsel, latency 0, IRQ 11 |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | Memory at fd000000 (32-bit, non-prefetchable) [size=16M] |
| | | | MIG M. |
| | Memory at 1000000000 (64-bit, prefetchable) [size=16G] |
| |=========================================+========================+======================|
| | Memory at 1400000000 (64-bit, prefetchable) [size=32M] |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| | Capabilities: [60] Power Management version 3 |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| | Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ |
| | | | N/A |
| | Capabilities: [78] Express Endpoint, MSI 00 |
| +-----------------------------------------+------------------------+----------------------+
| | Kernel driver in use: nvidia |
|
| | Kernel modules: nvidia |
| +-----------------------------------------------------------------------------------------+
| | [...] |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre> | | </pre> |
|
| |
|
| == Preparation of a VM running AlmaLinux 8 ==
| | * Vérifiez que le module du kernel <code>nvidia</code> est chargé. |
| | |
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
| |
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
| |
| | |
| <pre> | | <pre> |
| [root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot
| | root@gpu2:~# lsmod | grep nvidia |
| | nvidia 17936384 0 |
| | nvidia_drm 16384 0 |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | * Démarrez <code>nvidia-persistenced</code> pour créer les fichiers des pilotes et rendre le GPU accessible dans votre espace. |
| | |
| <pre> | | <pre> |
| [root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm
| | root@gpu2:~# systemctl restart nvidia-persistenced |
| </pre> | | root@gpu2:~# ls -al /dev/nvidia* |
| | crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 |
| | crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl |
| | crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset |
| | </pre> |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | Le GPU est maintenant disponible. |
| <pre>
| |
| [root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| |
| </pre>
| |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | == Préparer une instance CentOS 7 == |
|
| |
|
| <pre>
| | NVIDIA offre des dépôts pour différentes distributions; les logiciels requis peuvent donc être installés et maintenus via les dépôts. |
| [root@almalinux8]# nvidia-smi
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| == Preparation of a VM running Debian 11 ==
| | Pour compiler les sources des modules à partir du dépôt NVIDIA, il faut installer <code>dkms</code>. Ceci construit automatiquement les modules pour la mise à jour des kernels pour faire en sorte que le GPU fonctionne après une mise à jour de l'OS. |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| | <code>dkms</code> se trouve dans le dépôt EPEL. |
| | | Les en-têtes et la source du kernel doivent être installés avant la configuration du pilote NVIDIA. |
| <pre> | |
| root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | ===== Installer le dépôt EPEL et les logiciels requis ===== |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
|
| <pre> | | <pre> |
| root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb | | [root@gpu-centos centos]# yum -y update && reboot |
| root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb
| | yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r) |
| </pre> | | </pre> |
|
| |
|
| Update the local apt cache and install the vGPU packages:
| | ===== Installer le dépôt NVIDIA et le paquet du pilote ===== |
|
| |
|
| | Installez le dépôt <code>yum</code>. |
| <pre> | | <pre> |
| root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo |
| | yum install -y cuda-drivers |
| </pre> | | </pre> |
|
| |
|
| | NVIDIA utilise sa propre clé GPG pour identifier ses paquets. Quand <code>yum</code> demande si vous voulez l'auto-importer, répondez y pour ''yes''. |
| <pre> | | <pre> |
| root@debian11:~# nvidia-smi
| | Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| Tue Apr 23 18:55:18 2024
| | Importing GPG key 0x7FA2AF80: |
| +-----------------------------------------------------------------------------------------+
| | Userid : "cudatools <cudatools@nvidia.com>" |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80 |
| |-----------------------------------------+------------------------+----------------------+
| | From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Is this ok [y/N]: y |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Debian 12 ==
| |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | | </pre> |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | Après l'installation, redémarrez l'instance pour charger correctement le module et créer les fichiers pour les pilotes NVIDIA. |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
| |
| root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| <pre>
| |
| root@debian12:~# nvidia-smi
| |
| Tue Apr 23 18:55:18 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 22 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
| |
| root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| If your installation was successful, the vGPU will be accessible and licensed.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 20 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre> | | <pre> |
| root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos ~]# ls -al /dev/nvidia* |
| | crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0 |
| | crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl |
| | crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset |
| | crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm |
| | crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools |
| </pre> | | </pre> |
|
| |
|
| If your installation was successful, the vGPU will be accessible and licensed.
| | Le GPU est maintenant accessible à tous les outils dans votre espace. |
| | |
| <pre>
| |
| root@ubuntu20:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| [[Category:Cloud]] | | [[Category:Cloud]] |