Using cloud GPUs: Difference between revisions

Latest revision as of 18:40, 29 October 2024

Other languages:

English
français

This guide describes how to allocate GPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the GPU can be used.

Supported flavors

To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below. The GPU will be available to the operating system via the PCI bus.

g2-c24-112gb-500
g1-c14-56gb-500
g1-c14-56gb

Preparing a Debian 10 instance

To use the GPU via the PCI bus, the proprietary NVIDIA drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.

Enable the non-free pool

Log in using ssh and add the lines below to /etc/apt/sources.list, if they are not already there.

deb http://deb.debian.org/debian buster main contrib non-free
deb http://security.debian.org/ buster/updates main contrib non-free
deb http://deb.debian.org/debian buster-updates main contrib non-free

Install the NVIDIA driver

The following command:

updates the apt cache, so that apt will be aware of the new software pool sections,
updates the OS to the latest software versions, and
installs kernel headers, an NVIDIA driver, and pciutils, which will be required to list the devices connected to the PCI bus.

root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver

If this command finishes successfully, the NVIDIA driver will have been compiled and loaded.

Check if the GPU is exposed on the PCI bus

root@gpu2:~# lspci -vk
[...]
00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
	Physical Slot: 5
	Flags: bus master, fast devsel, latency 0, IRQ 11
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 1000000000 (64-bit, prefetchable) [size=16G]
	Memory at 1400000000 (64-bit, prefetchable) [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel driver in use: nvidia
	Kernel modules: nvidia
[...]

Check that the nvidia kernel module is loaded

root@gpu2:~# lsmod | grep nvidia
nvidia              17936384  0
nvidia_drm             16384  0

Start nvidia-persistenced, which will create the necessary device files and make the GPU accessible in user space.

root@gpu2:~# systemctl restart nvidia-persistenced
root@gpu2:~# ls -al /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Mar  6 18:55 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar  6 18:55 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Mar  6 18:55 /dev/nvidia-modeset

The GPU is now available within the user space and can be used.

Preparing a CentOS 7 instance

NVIDIA provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories.

To compile the module sources from the NVIDIA repository, it is necessary to install dkms. This will automatically build the modules on kernel updates, and therefore ensures that the GPU is still working after any update of the OS. dkms is provided in the EPEL repository. Kernel headers and the kernel source need to be installed before the NVIDIA driver can be set up.

Enable the EPEL repository and install needed software

[root@gpu-centos centos]# yum -y update && reboot
yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)

Add the NVIDIA repository and install the driver package

Install the yum repository:

[root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y cuda-drivers

NVIDIA uses its own GPG key to sign its packages. yum will ask to autoimport it. Reply "y" for "yes" when prompted.

Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Importing GPG key 0x7FA2AF80:
 Userid     : "cudatools <cudatools@nvidia.com>"
 Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80
 From       : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
Is this ok [y/N]: y

After installation, reboot the VM to properly load the module and create the NVIDIA device files.

[root@gpu-centos ~]# ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195,   0 Mar 10 20:35 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 241,   0 Mar 10 20:35 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 241,   1 Mar 10 20:35 /dev/nvidia-uvm-tools

The GPU is now accessible via any user space tool.

@@ Line 1: / Line 1: @@
-{{Draft}}
 <languages />
 <translate>
-== How to use GPU in cloud VMs ==
-<!--T:1-->
+<!--T:2-->
-This Howto describes the steps needed to allocate GPU resources to a virtual machine (VM), installing the necessary drivers as well as simple steps to on what to check to see that the GPU is properly allocated and cannow be used.
+This guide describes how to allocate GPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the GPU can be used.
-To use a GPU within a VM, the instance needs to be deployed with on for the flavors listed below, to make the GPU available to the Operating System via the PCI bus.
+== Supported flavors == <!--T:23-->
+<!--T:3-->
+To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below.  The GPU will be available to the operating system via the PCI bus.
+<!--T:4-->
 * g2-c24-112gb-500
 * g1-c14-56gb-500
 * g1-c14-56gb
-== Preparing a Debian 10 Instance ==
+== Preparing a Debian 10 instance == <!--T:5-->
-<!--T:2-->
-To use the GPU via the PCI bus, the proprietary Nvidia drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.
+<!--T:24-->
+To use the GPU via the PCI bus, the proprietary NVIDIA drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.
+===== Enable the non-free pool ===== <!--T:6-->
-<!--T:3-->
+<!--T:25-->
-===== <u>Enabling the non-free pool</u> =====
+Log in using ssh and add the lines below to ''/etc/apt/sources.list'', if they are not already there.
-Log in via ssh and add the sources below to ''/etc/apt/sources.list'', if not already in there.
+<!--T:7-->
 <pre>
 deb http://deb.debian.org/debian buster main contrib non-free
@@ Line 27: / Line 33: @@
 </pre>
-<!--T:4-->
+===== Install the NVIDIA driver ===== <!--T:8-->
-===== <u>Installing the Nvidia Driver</u> =====
-The following command will update the apt cache, so that apt will be aware of the new software pool sections, runs an upgrade to update the OS to the latest software versions and installs the kernel headers, the nvidia-driver and the pciutils, which will be required to list the devices connected to the PCI bus.
+<!--T:26-->
+The following command:
+* updates the <code>apt</code> cache, so that <code>apt</code> will be aware of the new software pool sections,
+* updates the OS to the latest software versions, and
+* installs kernel headers, an NVIDIA driver, and <code>pciutils</code>, which will be required to list the devices connected to the PCI bus.
+<!--T:9-->
+<pre>
+root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver
+</pre>
+<!--T:27-->
+If this command finishes successfully, the NVIDIA driver will have been compiled and loaded.
+<!--T:10-->
+* Check if the GPU is exposed on the PCI bus
+<pre>
+root@gpu2:~# lspci -vk
+[...]
+:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
+	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
+	Physical Slot: 5
+	Flags: bus master, fast devsel, latency 0, IRQ 11
+	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
+	Memory at 1000000000 (64-bit, prefetchable) [size=16G]
+	Memory at 1400000000 (64-bit, prefetchable) [size=32M]
+	Capabilities: [60] Power Management version 3
+	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
+	Capabilities: [78] Express Endpoint, MSI 00
+	Kernel driver in use: nvidia
+	Kernel modules: nvidia
+[...]
+</pre>
+<!--T:11-->
+* Check that the <code>nvidia</code> kernel module is loaded
+<pre>
+root@gpu2:~# lsmod | grep nvidia
+nvidia              17936384  0
+nvidia_drm             16384  0
+</pre>
+<!--T:12-->
+* Start <code>nvidia-persistenced</code>, which will create the necessary device files and make the GPU accessible in user space.
+<pre>
+root@gpu2:~# systemctl restart nvidia-persistenced
+root@gpu2:~# ls -al /dev/nvidia*
+crw-rw-rw- 1 root root 195,   0 Mar  6 18:55 /dev/nvidia0
+crw-rw-rw- 1 root root 195, 255 Mar  6 18:55 /dev/nvidiactl
+crw-rw-rw- 1 root root 195, 254 Mar  6 18:55 /dev/nvidia-modeset
+</pre>
+<!--T:14-->
+The GPU is now available within the user space and can be used.
+== Preparing a CentOS 7 instance == <!--T:15-->
+<!--T:28-->
+NVIDIA provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories.
+<!--T:16-->
+To compile the module sources from the NVIDIA repository, it is necessary to install <code>dkms</code>.
+This will automatically build the modules on kernel updates, and therefore ensures that the GPU is still working after any update of the OS.
+<code>dkms</code> is provided in the EPEL repository.
+Kernel headers and the kernel source need to be installed before the NVIDIA driver can be set up.
+===== Enable the EPEL repository and install needed software ===== <!--T:17-->
+<!--T:29-->
+<pre>
+[root@gpu-centos centos]# yum -y update && reboot
+yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)
+</pre>
+===== Add the NVIDIA repository and install the driver package ===== <!--T:18-->
+<!--T:30-->
+Install the <code>yum</code> repository:
+<pre>
+[root@gpu-centos centos]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
+yum install -y cuda-drivers
+</pre>
+<!--T:19-->
+NVIDIA uses its own GPG key to sign its packages.  <code>yum</code> will ask to autoimport it.  Reply "y" for "yes" when prompted.
+<pre>
+Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
+Importing GPG key 0x7FA2AF80:
+ Userid     : "cudatools <cudatools@nvidia.com>"
+ Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80
+ From       : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
+Is this ok [y/N]: y
+</pre>
+<!--T:21-->
+After installation, reboot the VM to properly load the module and create the NVIDIA device files.
 <pre>
-apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` nvidia-driver
+[root@gpu-centos ~]# ls -al /dev/nvidia*
+crw-rw-rw-. 1 root root 195,   0 Mar 10 20:35 /dev/nvidia0
+crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl
+crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset
+crw-rw-rw-. 1 root root 241,   0 Mar 10 20:35 /dev/nvidia-uvm
+crw-rw-rw-. 1 root root 241,   1 Mar 10 20:35 /dev/nvidia-uvm-tools
 </pre>
-After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the ''nvidia-persistenced'', which will create the device files and makes the GPU accessible to the user space.
+<!--T:22-->
+The GPU is now accessible via any user space tool.
 </translate>
+[[Category:Cloud]]

Using cloud GPUs: Difference between revisions

Latest revision as of 18:40, 29 October 2024

Contents

Supported flavors

Preparing a Debian 10 instance

Enable the non-free pool

Install the NVIDIA driver

Preparing a CentOS 7 instance

Enable the EPEL repository and install needed software

Add the NVIDIA repository and install the driver package

Navigation menu

Using cloud GPUs: Difference between revisions

Latest revision as of 18:40, 29 October 2024

Supported flavors

Preparing a Debian 10 instance

Enable the non-free pool

Install the NVIDIA driver

Preparing a CentOS 7 instance

Enable the EPEL repository and install needed software

Add the NVIDIA repository and install the driver package

Navigation menu

Search