Using cloud GPUs: Difference between revisions
m (Rdickson moved page Using cloud gpu to Using cloud GPUs: proper capitalization) |
(Marked this version for translation) |
||
Line 2: | Line 2: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
== How to use GPU in cloud VMs == | == How to use GPU in cloud VMs == <!--T:1--> | ||
<!--T:2--> | |||
This howto describes the steps needed to allocate GPU resources to a virtual machine (VM), installing the necessary drivers as well as simple steps to on what to check to see that the GPU is properly allocated and cannow be used. | This howto describes the steps needed to allocate GPU resources to a virtual machine (VM), installing the necessary drivers as well as simple steps to on what to check to see that the GPU is properly allocated and cannow be used. | ||
<!--T:3--> | |||
To use a GPU within a VM, the instance needs to be deployed with on for the flavors listed below, to make the GPU available to the Operating System via the PCI bus. | To use a GPU within a VM, the instance needs to be deployed with on for the flavors listed below, to make the GPU available to the Operating System via the PCI bus. | ||
<!--T:4--> | |||
* g2-c24-112gb-500 | * g2-c24-112gb-500 | ||
* g1-c14-56gb-500 | * g1-c14-56gb-500 | ||
* g1-c14-56gb | * g1-c14-56gb | ||
== Preparing a Debian 10 Instance == | == Preparing a Debian 10 Instance == <!--T:5--> | ||
To use the GPU via the PCI bus, the proprietary Nvidia drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only. | To use the GPU via the PCI bus, the proprietary Nvidia drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only. | ||
===== <u>Enabling the non-free pool</u> ===== | ===== <u>Enabling the non-free pool</u> ===== <!--T:6--> | ||
Log in via ssh and add the sources below to ''/etc/apt/sources.list'', if not already in there. | Log in via ssh and add the sources below to ''/etc/apt/sources.list'', if not already in there. | ||
<!--T:7--> | |||
<pre> | <pre> | ||
deb http://deb.debian.org/debian buster main contrib non-free | deb http://deb.debian.org/debian buster main contrib non-free | ||
Line 24: | Line 28: | ||
</pre> | </pre> | ||
===== <u>Installing the Nvidia Driver</u> ===== | ===== <u>Installing the Nvidia Driver</u> ===== <!--T:8--> | ||
The following command will update the apt cache, so that apt will be aware of the new software pool sections, runs an upgrade to update the OS to the latest software versions and installs the kernel headers, the nvidia-driver and the pciutils, which will be required to list the devices connected to the PCI bus. | The following command will update the apt cache, so that apt will be aware of the new software pool sections, runs an upgrade to update the OS to the latest software versions and installs the kernel headers, the nvidia-driver and the pciutils, which will be required to list the devices connected to the PCI bus. | ||
<!--T:9--> | |||
<pre> | <pre> | ||
root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver | ||
Line 32: | Line 37: | ||
After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the ''nvidia-persistenced'', which will create the device files and makes the GPU accessible to the user space. | After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the ''nvidia-persistenced'', which will create the device files and makes the GPU accessible to the user space. | ||
<!--T:10--> | |||
* Check if the GPU is exposed on the PCI bus | * Check if the GPU is exposed on the PCI bus | ||
<pre> | <pre> | ||
Line 51: | Line 57: | ||
</pre> | </pre> | ||
<!--T:11--> | |||
* Check that the nvidia kernel module is loaded | * Check that the nvidia kernel module is loaded | ||
<pre> | <pre> | ||
Line 58: | Line 65: | ||
</pre> | </pre> | ||
<!--T:12--> | |||
Now the userspace process can be started, which will create the necessary character device files. | Now the userspace process can be started, which will create the necessary character device files. | ||
<pre> | <pre> | ||
root@gpu2:~# service nvidia-persistenced restart | root@gpu2:~# service nvidia-persistenced restart | ||
<!--T:13--> | |||
root@gpu2:~# ls -al /dev/nvidia* | root@gpu2:~# ls -al /dev/nvidia* | ||
crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 | crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 | ||
Line 68: | Line 77: | ||
</pre> | </pre> | ||
<!--T:14--> | |||
The GPU is now available within the user space and can be used. | The GPU is now available within the user space and can be used. | ||
== Preparing a CentOS 7 Instance == | == Preparing a CentOS 7 Instance == <!--T:15--> | ||
Nvidia provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories. | Nvidia provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories. | ||
<!--T:16--> | |||
To compile the module sources from the nvidia repository, it is necessary to install dkms to automatically build the modules on kernel updates. | To compile the module sources from the nvidia repository, it is necessary to install dkms to automatically build the modules on kernel updates. | ||
It ensures that the GPU is still working after OS updates, dkms is provided in the EPEL repository and additionally the kernel headers and the kernel source needs to be installed | It ensures that the GPU is still working after OS updates, dkms is provided in the EPEL repository and additionally the kernel headers and the kernel source needs to be installed | ||
before the nvidia driver can be set up. | before the nvidia driver can be set up. | ||
===== <u>Enabling the EPEL repository and install needed software</u> ===== | ===== <u>Enabling the EPEL repository and install needed software</u> ===== <!--T:17--> | ||
<pre> | <pre> | ||
[root@gpu-centos centos]# yum -y update && reboot | [root@gpu-centos centos]# yum -y update && reboot | ||
Line 83: | Line 94: | ||
</pre> | </pre> | ||
===== <u>Adding the NVIDIA repository and install the driver package</u> ===== | ===== <u>Adding the NVIDIA repository and install the driver package</u> ===== <!--T:18--> | ||
<pre> | <pre> | ||
[root@gpu-centos centos]# yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo | [root@gpu-centos centos]# yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo | ||
Line 89: | Line 100: | ||
</pre> | </pre> | ||
<!--T:19--> | |||
Nvidia uses it's own gpg key to sign it's packages, yum will ask to autoimport it. | Nvidia uses it's own gpg key to sign it's packages, yum will ask to autoimport it. | ||
<!--T:20--> | |||
<pre> | <pre> | ||
Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub | Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub | ||
Line 100: | Line 113: | ||
</pre> | </pre> | ||
<!--T:21--> | |||
After the installation a reboot is required to properly load the module and create the nvidia device files. | After the installation a reboot is required to properly load the module and create the nvidia device files. | ||
<pre> | <pre> | ||
Line 110: | Line 124: | ||
</pre> | </pre> | ||
<!--T:22--> | |||
The GPU is now accessible via any user space tool. | The GPU is now accessible via any user space tool. | ||
</translate> | </translate> |
Revision as of 14:23, 19 March 2020
How to use GPU in cloud VMs
This howto describes the steps needed to allocate GPU resources to a virtual machine (VM), installing the necessary drivers as well as simple steps to on what to check to see that the GPU is properly allocated and cannow be used.
To use a GPU within a VM, the instance needs to be deployed with on for the flavors listed below, to make the GPU available to the Operating System via the PCI bus.
- g2-c24-112gb-500
- g1-c14-56gb-500
- g1-c14-56gb
Preparing a Debian 10 Instance
To use the GPU via the PCI bus, the proprietary Nvidia drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.
Enabling the non-free pool
Log in via ssh and add the sources below to /etc/apt/sources.list, if not already in there.
deb http://deb.debian.org/debian buster main contrib non-free deb http://security.debian.org/ buster/updates main contrib non-free deb http://deb.debian.org/debian buster-updates main contrib non-free
Installing the Nvidia Driver
The following command will update the apt cache, so that apt will be aware of the new software pool sections, runs an upgrade to update the OS to the latest software versions and installs the kernel headers, the nvidia-driver and the pciutils, which will be required to list the devices connected to the PCI bus.
root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` linux-headers-amd64 nvidia-driver
After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the nvidia-persistenced, which will create the device files and makes the GPU accessible to the user space.
- Check if the GPU is exposed on the PCI bus
root@gpu2:~# lspci -vk [...] 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] Physical Slot: 5 Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at 1000000000 (64-bit, prefetchable) [size=16G] Memory at 1400000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: nvidia Kernel modules: nvidia [...]
- Check that the nvidia kernel module is loaded
root@gpu2:~# lsmod | grep nvidia nvidia 17936384 0 nvidia_drm 16384 0
Now the userspace process can be started, which will create the necessary character device files.
root@gpu2:~# service nvidia-persistenced restart root@gpu2:~# ls -al /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset
The GPU is now available within the user space and can be used.
Preparing a CentOS 7 Instance
Nvidia provides repositories for various distributions, therefore the required software can be installed and maintained via these repositories.
To compile the module sources from the nvidia repository, it is necessary to install dkms to automatically build the modules on kernel updates. It ensures that the GPU is still working after OS updates, dkms is provided in the EPEL repository and additionally the kernel headers and the kernel source needs to be installed before the nvidia driver can be set up.
Enabling the EPEL repository and install needed software
[root@gpu-centos centos]# yum -y update && reboot yum -y install epel-release && yum -y install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Adding the NVIDIA repository and install the driver package
[root@gpu-centos centos]# yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo yum install -y cuda-drivers
Nvidia uses it's own gpg key to sign it's packages, yum will ask to autoimport it.
Retrieving key from http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub Importing GPG key 0x7FA2AF80: Userid : "cudatools <cudatools@nvidia.com>" Fingerprint: ae09 fe4b bd22 3a84 b2cc fce3 f60f 4b3d 7fa2 af80 From : http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub Is this ok [y/N]: y
After the installation a reboot is required to properly load the module and create the nvidia device files.
[root@gpu-centos ~]# ls -al /dev/nvidia* crw-rw-rw-. 1 root root 195, 0 Mar 10 20:35 /dev/nvidia0 crw-rw-rw-. 1 root root 195, 255 Mar 10 20:35 /dev/nvidiactl crw-rw-rw-. 1 root root 195, 254 Mar 10 20:35 /dev/nvidia-modeset crw-rw-rw-. 1 root root 241, 0 Mar 10 20:35 /dev/nvidia-uvm crw-rw-rw-. 1 root root 241, 1 Mar 10 20:35 /dev/nvidia-uvm-tools
The GPU is now accessible via any user space tool.