|
|
Line 1: |
Line 1: |
| | |
| <languages /> | | <languages /> |
|
| |
|
| This page describes how to | | This guide describes how to allocate GPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the GPU can be used. |
| * allocate virtual GPU (vGPU) resources to a virtual machine (VM),
| |
| * install the necessary drivers and
| |
| * check whether the vGPU can be used.
| |
| Access to repositories as well as to the vGPUs is currently only available within [https://arbutus.cloud.computecanada.ca Arbutus Cloud]. Please note that the documentation below only covers the vGPU driver installation. The [https://developer.nvidia.com/cuda-toolkit-archive CUDA toolkit] is not pre-installed but you can install it directly from NVIDIA or load it from [[Accessing_CVMFS|the CVMFS software stack]].
| |
| If you choose to install the toolkit directly from NVIDIA, please ensure that the vGPU driver is not overwritten with the one from the CUDA package.
| |
|
| |
|
| == Supported flavors == | | == Supported flavors == |
|
| |
|
| To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus. | | To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below. The GPU will be available to the operating system via the PCI bus. |
|
| |
|
| * g1-8gb-c4-22gb | | * g2-c24-112gb-500 |
| * g1-16gb-c8-40gb | | * g1-c14-56gb-500 |
| | * g1-c14-56gb |
|
| |
|
| == Preparation of a VM running AlmaLinux 9 == | | == Preparing a Debian 10 instance == |
|
| |
|
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
| | To use the GPU via the PCI bus, the proprietary NVIDIA drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only. |
| Then, reboot the VM to have the latest kernel running.
| |
|
| |
|
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
| | ===== Enable the non-free pool ===== |
|
| |
|
| AlmaLinux 9 has by default a faulty <code>nouveau</code> driver which crashes the kernel as soon as the <code>nvidia</code> driver is mounted.
| | Log in using ssh and add the lines below to ''/etc/apt/sources.list'', if they are not already there. |
| The VM needs a few extra steps to prevent the loading of the nouveau driver when the system boots.
| |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
| | deb http://deb.debian.org/debian buster main contrib non-free |
| [root@almalinux9]# dracut -fv --omit-drivers nouveau
| | deb http://security.debian.org/ buster/updates main contrib non-free |
| [root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
| | deb http://deb.debian.org/debian buster-updates main contrib non-free |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | ===== Install the NVIDIA driver ===== |
| | |
| <pre>
| |
| [root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
| |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools. | | The following command: |
| | * updates the <code>apt</code> cache, so that <code>apt</code> will be aware of the new software pool sections, |
| | * updates the OS to the latest software versions, and |
| | * installs kernel headers, an NVIDIA driver, and <code>pciutils</code>, which will be required to list the devices connected to the PCI bus. |
|
| |
|
| <pre> | | <pre> |
| [root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver |
| </pre> | | </pre> |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | If this command finishes successfully, the NVIDIA driver will have been compiled and loaded. |
|
| |
|
| | * Check if the GPU is exposed on the PCI bus |
| <pre> | | <pre> |
| [root@almalinux9]# nvidia-smi
| | root@gpu2:~# lspci -vk |
| Tue Apr 23 16:37:31 2024
| | [...] |
| +-----------------------------------------------------------------------------------------+
| | 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Subsystem: NVIDIA Corporation GK210GL [Tesla K80] |
| |-----------------------------------------+------------------------+----------------------+
| | Physical Slot: 5 |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Flags: bus master, fast devsel, latency 0, IRQ 11 |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | Memory at fd000000 (32-bit, non-prefetchable) [size=16M] |
| | | | MIG M. |
| | Memory at 1000000000 (64-bit, prefetchable) [size=16G] |
| |=========================================+========================+======================|
| | Memory at 1400000000 (64-bit, prefetchable) [size=32M] |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| | Capabilities: [60] Power Management version 3 |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| | Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ |
| | | | N/A |
| | Capabilities: [78] Express Endpoint, MSI 00 |
| +-----------------------------------------+------------------------+----------------------+
| | Kernel driver in use: nvidia |
|
| | Kernel modules: nvidia |
| +-----------------------------------------------------------------------------------------+
| | [...] |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre> | | </pre> |
|
| |
|
| == Preparation of a VM running AlmaLinux 8 ==
| | * Check that the <code>nvidia</code> kernel module is loaded |
| | |
| Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then, reboot the VM to have the latest kernel running.
| |
| To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the EPEL repository is required.
| |
| | |
| <pre> | | <pre> |
| [root@vgpu almalinux]# dnf -y update && dnf -y install epel-release && reboot
| | root@gpu2:~# lsmod | grep nvidia |
| | nvidia 17936384 0 |
| | nvidia_drm 16384 0 |
| </pre> | | </pre> |
|
| |
|
| After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
| | * Start <code>nvidia-persistenced</code>, which will create the necessary device files and make the GPU accessible in user space. |
| | |
| <pre> | | <pre> |
| [root@almalinux8]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma8/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el8.noarch.rpm
| | root@gpu2:~# systemctl restart nvidia-persistenced |
| </pre> | | root@gpu2:~# ls -al /dev/nvidia* |
| | crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 |
| | crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl |
| | crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset |
| | </pre> |
|
| |
|
| The next step is to install the vGPU packages, which will install the required driver and user-space tools. | | The GPU is now available within the user space and can be used. |
| <pre>
| |
| [root@vgpu almalinux]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
| |
| </pre>
| |
|
| |
|
| After a successful installation, <code>nvidia-smi</code> can be used to verify the proper functionality.
| | == Preparing a CentOS 7 instance == |
|
| |
|
| <pre>
| | NVIDIA provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories. |
| [root@almalinux8]# nvidia-smi
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| == Preparation of a VM running Debian 11 ==
| | To compile the module sources from the NVIDIA repository, it is necessary to install <code>dkms</code>. |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| | This will automatically build the modules on kernel updates, and therefore ensures that the GPU is still working after any update of the OS. |
| | | <code>dkms</code> is provided in the EPEL repository. |
| <pre> | | Kernel headers and the kernel source need to be installed before the NVIDIA driver can be set up. |
| root@debian11:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package.
| | ===== Enable the EPEL repository and install needed software ===== |
| This package also contains the gpg key all packages are signed with.
| |
|
| |
|
| <pre> | | <pre> |
| root@debian11:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb11/pool/main/arbutus-cloud-repo_0.1_all.deb | | [root@gpu-centos centos]# yum -y update && reboot |
| root@debian11:~# apt-get install -y ./arbutus-cloud-repo_0.1_all.deb
| | yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r) |
| </pre> | | </pre> |
|
| |
|
| Update the local apt cache and install the vGPU packages:
| | ===== Add the NVIDIA repository and install the driver package ===== |
|
| |
|
| | Install the <code>yum</code> repository: |
| <pre> | | <pre> |
| root@debian11:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo |
| | yum install -y cuda-drivers |
| </pre> | | </pre> |
|
| |
|
| | NVIDIA uses its own GPG key to sign its packages. <code>yum</code> will ask to autoimport it. Reply "y" for "yes" when prompted. |
| <pre> | | <pre> |
| root@debian11:~# nvidia-smi
| | Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| Tue Apr 23 18:55:18 2024
| | Importing GPG key 0x7FA2AF80: |
| +-----------------------------------------------------------------------------------------+
| | Userid : "cudatools <cudatools@nvidia.com>" |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| | Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80 |
| |-----------------------------------------+------------------------+----------------------+
| | From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| | Is this ok [y/N]: y |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Debian 12 ==
| |
| Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as <b>DKMS</b> will request the latest one available from the Debian repositories.
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre> | | </pre> |
|
| |
|
| After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the <code>arbutus-cloud-repo</code> package. | | After installation, reboot the VM to properly load the module and create the NVIDIA device files. |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@debian12:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/deb12/pool/main/arbutus-cloud-repo_0.1+deb12_all.deb
| |
| root@debian12:~# apt-get install -y ./arbutus-cloud-repo_0.1+deb12_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| | |
| <pre>
| |
| root@debian12:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| <pre>
| |
| root@debian12:~# nvidia-smi
| |
| Tue Apr 23 18:55:18 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 22 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt22/pool/main/arbutus-cloud-repo_0.1_all.deb
| |
| root@ubuntu22:~# apt-get install ./arbutus-cloud-repo_0.1_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre>
| |
| root@ubuntu22:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd
| |
| </pre>
| |
| | |
| If your installation was successful, the vGPU will be accessible and licensed.
| |
| | |
| <pre>
| |
| root@ubuntu22:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
| | |
| == Preparation of a VM running Ubuntu 20 ==
| |
| Ensure that the OS is up to date, that all the latest patches are installed, and that the latest stable kernel is running.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot
| |
| </pre>
| |
| | |
| After a successful reboot, the system should have the latest available kernel running.
| |
| Now the repository can be installed by installing the <code>arbutus-cloud-repo</code> package.
| |
| This package also contains the gpg key all packages are signed with.
| |
| | |
| <pre>
| |
| root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubnt20/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| root@ubuntu20:~# apt-get install ./arbutus-cloud-repo_0.1ubuntu20_all.deb
| |
| </pre>
| |
| | |
| Update the local apt cache and install the vGPU packages:
| |
| <pre> | | <pre> |
| root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd | | [root@gpu-centos ~]# ls -al /dev/nvidia* |
| | crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0 |
| | crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl |
| | crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset |
| | crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm |
| | crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools |
| </pre> | | </pre> |
|
| |
|
| If your installation was successful, the vGPU will be accessible and licensed.
| | The GPU is now accessible via any user space tool. |
| | |
| <pre>
| |
| root@ubuntu20:~# nvidia-smi
| |
| Wed Apr 24 14:37:52 2024
| |
| +-----------------------------------------------------------------------------------------+
| |
| | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
| |
| |-----------------------------------------+------------------------+----------------------+
| |
| | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| |
| | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| |
| | | | MIG M. |
| |
| |=========================================+========================+======================|
| |
| | 0 GRID V100D-8C On | 00000000:00:06.0 Off | 0 |
| |
| | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| |
| | | | N/A |
| |
| +-----------------------------------------+------------------------+----------------------+
| |
|
| |
| +-----------------------------------------------------------------------------------------+
| |
| | Processes: |
| |
| | GPU GI CI PID Type Process name GPU Memory |
| |
| | ID ID Usage |
| |
| |=========================================================================================|
| |
| | No running processes found |
| |
| +-----------------------------------------------------------------------------------------+
| |
| </pre>
| |
|
| |
|
| [[Category:Cloud]] | | [[Category:Cloud]] |
This guide describes how to allocate GPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the GPU can be used.
Supported flavors
To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below. The GPU will be available to the operating system via the PCI bus.
- g2-c24-112gb-500
- g1-c14-56gb-500
- g1-c14-56gb
Preparing a Debian 10 instance
To use the GPU via the PCI bus, the proprietary NVIDIA drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.
Enable the non-free pool
Log in using ssh and add the lines below to /etc/apt/sources.list, if they are not already there.
deb http://deb.debian.org/debian buster main contrib non-free
deb http://security.debian.org/ buster/updates main contrib non-free
deb http://deb.debian.org/debian buster-updates main contrib non-free
Install the NVIDIA driver
The following command:
- updates the
apt
cache, so that apt
will be aware of the new software pool sections,
- updates the OS to the latest software versions, and
- installs kernel headers, an NVIDIA driver, and
pciutils
, which will be required to list the devices connected to the PCI bus.
root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver
If this command finishes successfully, the NVIDIA driver will have been compiled and loaded.
- Check if the GPU is exposed on the PCI bus
root@gpu2:~# lspci -vk
[...]
00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Physical Slot: 5
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at 1000000000 (64-bit, prefetchable) [size=16G]
Memory at 1400000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Kernel driver in use: nvidia
Kernel modules: nvidia
[...]
- Check that the
nvidia
kernel module is loaded
root@gpu2:~# lsmod | grep nvidia
nvidia 17936384 0
nvidia_drm 16384 0
- Start
nvidia-persistenced
, which will create the necessary device files and make the GPU accessible in user space.
root@gpu2:~# systemctl restart nvidia-persistenced
root@gpu2:~# ls -al /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset
The GPU is now available within the user space and can be used.
Preparing a CentOS 7 instance
NVIDIA provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories.
To compile the module sources from the NVIDIA repository, it is necessary to install dkms
.
This will automatically build the modules on kernel updates, and therefore ensures that the GPU is still working after any update of the OS.
dkms
is provided in the EPEL repository.
Kernel headers and the kernel source need to be installed before the NVIDIA driver can be set up.
Enable the EPEL repository and install needed software
[root@gpu-centos centos]# yum -y update && reboot
yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Add the NVIDIA repository and install the driver package
Install the yum
repository:
[root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y cuda-drivers
NVIDIA uses its own GPG key to sign its packages. yum
will ask to autoimport it. Reply "y" for "yes" when prompted.
Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Importing GPG key 0x7FA2AF80:
Userid : "cudatools <cudatools@nvidia.com>"
Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80
From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Is this ok [y/N]: y
After installation, reboot the VM to properly load the module and create the NVIDIA device files.
[root@gpu-centos ~]# ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools
The GPU is now accessible via any user space tool.