Using cloud GPUs: Difference between revisions
No edit summary |
No edit summary |
||
Line 32: | Line 32: | ||
<pre> | <pre> | ||
apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` nvidia-driver | root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` nvidia-driver | ||
</pre> | </pre> | ||
After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the ''nvidia-persistenced'', which will create the device files and makes the GPU accessible to the user space. | After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the ''nvidia-persistenced'', which will create the device files and makes the GPU accessible to the user space. | ||
Line 64: | Line 64: | ||
Now the userspace process can be started, which will create the necessary character device files. | Now the userspace process can be started, which will create the necessary character device files. | ||
<pre> | <pre> | ||
service nvidia-persistenced restart | root@gpu2:~# service nvidia-persistenced restart | ||
root@gpu2:~# ls -al /dev/nvidia* | root@gpu2:~# ls -al /dev/nvidia* | ||
Line 85: | Line 85: | ||
* Installation for the required tools and libraries. | * Installation for the required tools and libraries. | ||
<pre> | |||
root@gpu2:~# yum -y install epel-release gcc kernel-devel kernel-headers pciutils | |||
root@gpu2:~# yum -y dkms | |||
</pre> | |||
</translate> | </translate> |
Revision as of 15:26, 9 March 2020
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
How to use GPU in cloud VMs
This Howto describes the steps needed to allocate GPU resources to a virtual machine (VM), installing the necessary drivers as well as simple steps to on what to check to see that the GPU is properly allocated and cannow be used.
To use a GPU within a VM, the instance needs to be deployed with on for the flavors listed below, to make the GPU available to the Operating System via the PCI bus.
- g2-c24-112gb-500
- g1-c14-56gb-500
- g1-c14-56gb
Preparing a Debian 10 Instance
To use the GPU via the PCI bus, the proprietary Nvidia drivers are required. Due to Debian's policy, the drivers are available from the non-free pool only.
Enabling the non-free pool
Log in via ssh and add the sources below to /etc/apt/sources.list, if not already in there.
deb http://deb.debian.org/debian buster main contrib non-free deb http://security.debian.org/ buster/updates main contrib non-free deb http://deb.debian.org/debian buster-updates main contrib non-free
Installing the Nvidia Driver
The following command will update the apt cache, so that apt will be aware of the new software pool sections, runs an upgrade to update the OS to the latest software versions and installs the kernel headers, the nvidia-driver and the pciutils, which will be required to list the devices connected to the PCI bus.
root@gpu2:~# apt-get update && apt-get -y dist-upgrade && apt-get -y install pciutils linux-headers-`uname -r` nvidia-driver
After the installation has finished and the nvidia has been automatically compiled and loaded, the following steps can be used to verify that everything has been prepared to launch the nvidia-persistenced, which will create the device files and makes the GPU accessible to the user space.
- Check if the GPU is exposed on the PCI bus
root@gpu2:~# lspci -vk [...] 00:05.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] Physical Slot: 5 Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at 1000000000 (64-bit, prefetchable) [size=16G] Memory at 1400000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: nvidia Kernel modules: nvidia [...]
- Check that the nvidia kernel module is loaded
root@gpu2:~# lsmod | grep nvidia nvidia 17936384 0 nvidia_drm 16384 0
Now the userspace process can be started, which will create the necessary character device files.
root@gpu2:~# service nvidia-persistenced restart root@gpu2:~# ls -al /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Mar 6 18:55 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Mar 6 18:55 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Mar 6 18:55 /dev/nvidia-modeset
The GPU is now available within the user space and can be used.
Preparing a Centos 7 Instance
CentOS has no officialrepository for the Nvidia driver and the kernel module needs to be compile from the vendors sources.
- Update the OS on the instance to the latest version and reboot the system
root@gpu2:~# yum -y update && reboot
- Installation for the required tools and libraries.
root@gpu2:~# yum -y install epel-release gcc kernel-devel kernel-headers pciutils root@gpu2:~# yum -y dkms