Using cloud vGPUs

From Alliance Doc
Revision as of 16:11, 1 June 2020 by Bott (talk | contribs)
Jump to navigation Jump to search


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Other languages:

This guide describes how to allocate vGPU resources to a virtual machine (VM), installing the necessary drivers and checking whether the vGPU can be used.

Supported flavors

To use a GPU within a VM, the instance needs to be deployed on one of the flavors listed below. The GPU will be available to the operating system via the PCI bus.

  • vgpu1-c18-56gb

Preparation of a VM running CentOS7

Once the VM is available, make sure to update the OS to the latest available software, including the kernel and reboot the VM to have the latest kernel running.

[root@test centos]# yum -y update && reboot

Since the proprietary nvidia drivers need to be compiled against the running kernel, the package dkms is required from the EPEL Repository

[root@test centos]# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Install the Arbutus Cloud repository definition, it also installs the public key the package are signed with to ensure their authenticity, since these drivers and userspace tools are carefully tested first against the infrastructure, before they are made available.

[root@test centos]# yum -y install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/7/x86_64/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el7.noarch.rpm

The last step is to install the nvidia vGPU packages. The kernel module package 'nvidia-vgpu-kmod', will take a few minutes as it compiles the required kernel modules in the background.

yum -y install nvidia-vgpu-kmod nvidia-vgpu-gridd nvidia-vgpu-tools

After the successful installation, the vGPU is a now accessible and licensed.

[root@test centos]# nvidia-smi         
Mon Jun  1 16:03:27 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.56       Driver Version: 440.56       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID V100D-8C       On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    560MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                  
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

To check for the license status as well as other information for the vGPU.

nvidia-smi -q |less
==============NVSMI LOG==============

Timestamp                           : Mon Jun  1 16:06:59 2020
Driver Version                      : 440.56
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:00:05.0
    Product Name                    : GRID V100D-8C
    Product Brand                   : Grid
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-315b585a-a41e-11ea-a63b-4ed0221b4f99
    Minor Number                    : 0
    VBIOS Version                   : 00.00.00.00.00
    MultiGPU Board                  : No
    Board ID                        : 0x5
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : VGPU
        Host VGPU Mode              : N/A
    GRID Licensed Product
        Product Name                : NVIDIA vComputeServer
        License Status              : Licensed
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x00
        Device                      : 0x05
        Domain                      : 0x0000
        Device Id                   : 0x1DB610DE
        Bus Id                      : 00000000:00:05.0
        Sub System Id               : 0x139610DE

Preparation of a VM running CentOS8

Preparation of a VM running Debian10

Preparation of a VM running Ubuntu20