|
|
(2 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| | |
| <languages /> | | <languages /> |
|
| |
|
| This page describes how to
| | Vous trouverez ici l'information sur comment allouer des ressources GPU à une instance virtuelle (VM ou ''virtual machine''), comment installer les pilotes requis et comment vérifier si le GPU peut être utilisé. |
| * allocate virtual GPU (vGPU) resources to a virtual machine (VM),
| |
| * install the necessary drivers and
| |
| * check whether the vGPU can be used.
| |
| Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]].
| |
| If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
| |
|
| |
|
| == Supported flavors == | | == Gabarits pris en charge == |
|
| |
|
| <div class="mw-translate-fuzzy">
| |
| Pour utiliser un GPU dans une instance, cette dernière doit être déployée selon un des gabarits listés ci-dessous. Le système d'exploitation accède au GPU via le bus PCI. | | Pour utiliser un GPU dans une instance, cette dernière doit être déployée selon un des gabarits listés ci-dessous. Le système d'exploitation accède au GPU via le bus PCI. |
| </div>
| |
|
| |
|
| <div class="mw-translate-fuzzy">
| |
| * g2-c24-112gb-500 | | * g2-c24-112gb-500 |
| * g1-c14-56gb-500 | | * g1-c14-56gb-500 |
| * g1-c14-56gb | | * g1-c14-56gb |
| </div>
| |
|
| |
|
| == Preparation of a VM running AlmaLinux 9 == | | == Préparer une instance Debian 10 == |
|
| |
|
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
| | Les pilotes fournis par NVDIA sont nécessaires pour utiliser un GPU via le bus PCI. Comme l'exige Debian, les pilotes doivent être ceux de la section ''non-free''. |
| Then, reboot the VM to have the latest kernel running.
| |
|
| |
|
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
| | ===== Section ''non-free'' ===== |
|
| |
|
| AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.
| | Connectez-vous par SSH et, si elles n'y sont pas, ajoutez à ''/etc/apt/sources.list'' les lignes suivantes : |
| The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
| |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
| | deb http://deb.debian.org/debian buster main contrib non-free |
| [root@almalinux9]# dracut -fv --omit-drivers nouveau
| | deb http://security.debian.org/ buster/updates main contrib non-free |
| [root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
| | deb http://deb.debian.org/debian buster-updates main contrib non-free |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | ===== Installer le pilote NVIDIA ===== |
| | |
| <pre>
| |
| [root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
| |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | La commande suivante |
| | * met à jour la cache <code>apt</code> pour que <code>apt</code> considère les nouvelles sections; |
| | * met à jour le système d'exploitation; |
| | * installe les en-têtes du noyau (''kernel''), un pilote NVIDIA et <code>pciutils</code>, requis pour lister les pilotes connectés au bus PCI. |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver |
| </pre> | | </pre> |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | Si la commande s'effectue sans erreur, le pilote NVIDIA aura été compilé et chargé. |
|
| |
|
| | * Vérifiez si le GPU est visible sur le bus PCI. |
| <pre> | | <pre> |
| [root@almalinux9]# nvidia-smi
| | root@gpu2:~# lspci -vk |
| Tue Apr 23 16:37:31 2024
| | [...] |
| +-----------------------------------------------------------------------------------------+
| | 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Subsystem: NVIDIA Corporation GK210GL [Tesla K80] |
| |-----------------------------------------+------------------------+----------------------+
| | Physical Slot: 5 |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Flags: bus master, fast devsel, latency 0, IRQ 11 |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | Memory at fd000000 (32-bit, non-prefetchable) [size=16M] |
| | | | MIG M. |
| | Memory at 1000000000 (64-bit, prefetchable) [size=16G] |
| |=========================================+========================+======================|
| | Memory at 1400000000 (64-bit, prefetchable) [size=32M] |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| | Capabilities: [60] Power Management version 3 |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| | Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ |
| | | | N/A |
| | Capabilities: [78] Express Endpoint, MSI 00 |
| +-----------------------------------------+------------------------+----------------------+
| | Kernel driver in use: nvidia |
|
| | Kernel modules: nvidia |
| +-----------------------------------------------------------------------------------------+
| | [...] |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre> | | </pre> |
|
| |
|
| == Preparation of a VM running AlmaLinux 8 ==
| | * Vérifiez que le module du kernel <code>nvidia</code> est chargé. |
| | |
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
| |
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
| |
| | |
| <pre> | | <pre> |
| [root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot
| | root@gpu2:~# lsmod | grep nvidia |
| | nvidia 17936384 0 |
| | nvidia_drm 16384 0 |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | * Démarrez <code>nvidia-persistenced</code> pour créer les fichiers des pilotes et rendre le GPU accessible dans votre espace. |
| | |
| <pre> | | <pre> |
| [root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm
| | root@gpu2:~# systemctl restart nvidia-persistenced |
| </pre> | | root@gpu2:~# ls -al /dev/nvidia* |
| | crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 |
| | crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl |
| | crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset |
| | </pre> |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools.
| | Le GPU est maintenant disponible. |
| <pre>
| |
| [root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| |
| </pre>
| |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | == Préparer une instance CentOS 7 == |
|
| |
|
| <pre>
| | NVIDIA offre des dépôts pour différentes distributions; les logiciels requis peuvent donc être installés et maintenus via les dépôts. |
| [root@almalinux8]# nvidia-smi
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| == Preparation of a VM running Debian 11 ==
| | Pour compiler les sources des modules à partir du dépôt NVIDIA, il faut installer <code>dkms</code>. Ceci construit automatiquement les modules pour la mise à jour des kernels pour faire en sorte que le GPU fonctionne après une mise à jour de l'OS. |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| | <code>dkms</code> se trouve dans le dépôt EPEL. |
| | | Les en-têtes et la source du kernel doivent être installés avant la configuration du pilote NVIDIA. |
| <pre> | |
| root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | ===== Installer le dépôt EPEL et les logiciels requis ===== |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
|
| <pre> | | <pre> |
| root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb | | [root@gpu-centos centos]# yum -y update && reboot |
| root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb
| | yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r) |
| </pre> | | </pre> |
|
| |
|
| Update the local apt cache and install the vGPU packages:
| | ===== Installer le dépôt NVIDIA et le paquet du pilote ===== |
|
| |
|
| | Installez le dépôt <code>yum</code>. |
| <pre> | | <pre> |
| root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo |
| | yum install -y cuda-drivers |
| </pre> | | </pre> |
|
| |
|
| | NVIDIA utilise sa propre clé GPG pour identifier ses paquets. Quand <code>yum</code> demande si vous voulez l'auto-importer, répondez y pour ''yes''. |
| <pre> | | <pre> |
| root@debian11:~# nvidia-smi
| | Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| Tue Apr 23 18:55:18 2024
| | Importing GPG key 0x7FA2AF80: |
| +-----------------------------------------------------------------------------------------+
| | Userid : "cudatools <cudatools@nvidia.com>" |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80 |
| |-----------------------------------------+------------------------+----------------------+
| | From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Is this ok [y/N]: y |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Debian 12 ==
| |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | | </pre> |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | Après l'installation, redémarrez l'instance pour charger correctement le module et créer les fichiers pour les pilotes NVIDIA. |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
| |
| root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| <pre>
| |
| root@debian12:~# nvidia-smi
| |
| Tue Apr 23 18:55:18 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 22 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
| |
| root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| If your installation was successful, the vGPU will be accessible and licensed.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 20 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre> | | <pre> |
| root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos ~]# ls -al /dev/nvidia* |
| | crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0 |
| | crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl |
| | crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset |
| | crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm |
| | crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools |
| </pre> | | </pre> |
|
| |
|
| If your installation was successful, the vGPU will be accessible and licensed.
| | Le GPU est maintenant accessible à tous les outils dans votre espace. |
| | |
| <pre>
| |
| root@ubuntu20:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| [[Category:Cloud]] | | [[Category:Cloud]] |
Vous trouverez ici l'information sur comment allouer des ressources GPU à une instance virtuelle (VM ou virtual machine), comment installer les pilotes requis et comment vérifier si le GPU peut être utilisé.
Gabarits pris en charge
Pour utiliser un GPU dans une instance, cette dernière doit être déployée selon un des gabarits listés ci-dessous. Le système d'exploitation accède au GPU via le bus PCI.
- g2-c24-112gb-500
- g1-c14-56gb-500
- g1-c14-56gb
Préparer une instance Debian 10
Les pilotes fournis par NVDIA sont nécessaires pour utiliser un GPU via le bus PCI. Comme l'exige Debian, les pilotes doivent être ceux de la section non-free.
Section non-free
Connectez-vous par SSH et, si elles n'y sont pas, ajoutez à /etc/apt/sources.list les lignes suivantes :
deb http://deb.debian.org/debian buster main contrib non-free
deb http://security.debian.org/ buster/updates main contrib non-free
deb http://deb.debian.org/debian buster-updates main contrib non-free
Installer le pilote NVIDIA
La commande suivante
- met à jour la cache
apt
pour que apt
considère les nouvelles sections;
- met à jour le système d'exploitation;
- installe les en-têtes du noyau (kernel), un pilote NVIDIA et
pciutils
, requis pour lister les pilotes connectés au bus PCI.
root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver
Si la commande s'effectue sans erreur, le pilote NVIDIA aura été compilé et chargé.
- Vérifiez si le GPU est visible sur le bus PCI.
root@gpu2:~# lspci -vk
[...]
00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Physical Slot: 5
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at 1000000000 (64-bit, prefetchable) [size=16G]
Memory at 1400000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Kernel driver in use: nvidia
Kernel modules: nvidia
[...]
- Vérifiez que le module du kernel
nvidia
est chargé.
root@gpu2:~# lsmod | grep nvidia
nvidia 17936384 0
nvidia_drm 16384 0
- Démarrez
nvidia-persistenced
pour créer les fichiers des pilotes et rendre le GPU accessible dans votre espace.
root@gpu2:~# systemctl restart nvidia-persistenced
root@gpu2:~# ls -al /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset
Le GPU est maintenant disponible.
Préparer une instance CentOS 7
NVIDIA offre des dépôts pour différentes distributions; les logiciels requis peuvent donc être installés et maintenus via les dépôts.
Pour compiler les sources des modules à partir du dépôt NVIDIA, il faut installer dkms
. Ceci construit automatiquement les modules pour la mise à jour des kernels pour faire en sorte que le GPU fonctionne après une mise à jour de l'OS.
dkms
se trouve dans le dépôt EPEL.
Les en-têtes et la source du kernel doivent être installés avant la configuration du pilote NVIDIA.
Installer le dépôt EPEL et les logiciels requis
[root@gpu-centos centos]# yum -y update && reboot
yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Installer le dépôt NVIDIA et le paquet du pilote
Installez le dépôt yum
.
[root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y cuda-drivers
NVIDIA utilise sa propre clé GPG pour identifier ses paquets. Quand yum
demande si vous voulez l'auto-importer, répondez y pour yes.
Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Importing GPG key 0x7FA2AF80:
Userid : "cudatools <cudatools@nvidia.com>"
Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80
From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Is this ok [y/N]: y
Après l'installation, redémarrez l'instance pour charger correctement le module et créer les fichiers pour les pilotes NVIDIA.
[root@gpu-centos ~]# ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools
Le GPU est maintenant accessible à tous les outils dans votre espace.