Using cloud vGPUs: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 17: Line 17:
* g1-8gb-c4-22gb
* g1-8gb-c4-22gb
* g1-16gb-c8-40gb
* g1-16gb-c8-40gb
<!-- ============ UPDATE PREPARATION DO NOT PUBLISH YET! ========================== -->
<!--
== Preparation of a VM running Almalinux9 == <!--T:76-->
<!--T:77-->
Once the VM is available, make sure to update the OS to the latest available software, including the kernel.
Then, reboot the VM to have the latest kernel running.
To have access to the [https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support DKMS package], the [https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm EPEL repository] is required.
Almalinux9 has per default a faulty nouveau driver which crashes the kernel as soon as the nvidia driver is mounted, the VM needs a few extra steps to prevent
the loading of the nouveau driver when the system boots.
<pre>
[root@almalinux9]# echo -e "blacklist nouveau\noptions nouveau modeset=0" >/etc/modprobe.d/blacklist-nouveau.conf
[root@almalinux9]# dracut -fv --omit-drivers nouveau
[root@almalinux9]# dnf -y update && dnf -y install epel-release && reboot
</pre>
<!--T:78-->
After the reboot of the VM, the Arbutus vGPU Cloud repository needs to be installed.
<!--T:79-->
<pre>
[root@almalinux9]# dnf install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm</pre>
<!--T:80-->
The next step is to install the vGPU packages, which will install the required driver and user-space tools.
<!--T:81-->
<pre>
[root@almalinux9]# dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
</pre>
<!--T:82-->
After a successful  installation, <b>nvidia-smi</b> can be used to verify the proper functionality.
<!--T:83-->
<pre>
[root@almalinux9]# nvidia-smi
Tue Apr 23 16:37:31 2024     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4    |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf          Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                        |                        |              MIG M. |
|=========================================+========================+======================|
|  0  GRID V100D-8C                  On  |  00000000:00:06.0 Off |                    0 |
| N/A  N/A    P0            N/A /  N/A  |      0MiB /  8192MiB |      0%      Default |
|                                        |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU  GI  CI        PID  Type  Process name                              GPU Memory |
|        ID  ID                                                              Usage      |
|=========================================================================================|
|  No running processes found                                                            |
+-----------------------------------------------------------------------------------------+
</pre>
-->


== Preparation of a VM running CentOS7 == <!--T:5-->
== Preparation of a VM running CentOS7 == <!--T:5-->
cc_staff
247

edits