RAPIDS: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 13: Line 13:
This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container.
This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container.


=Building a Singularity image for RAPIDS= <!--T:5-->
=Building a Singularity image= <!--T:5-->


To build a Singularity image for RAPIDS the first thing to do is to find and select a Docker image for RAPIDS provided by NVIDIA.
To build a Singularity image for RAPIDS, the first thing to do is to find and select a Docker image provided by NVIDIA.
   
   
==Finding a Docker image for RAPIDS== <!--T:6-->
==Finding a Docker image== <!--T:6-->
   
   
There are three types of RAPIDS Docker images: ''base'', ''runtime'', and ''devel''. For each type, multiple images are provided for different combinations of RAPIDS and CUDA versions, either on Ubuntu or on CentOS. You can find the Docker <tt>pull</tt> command for a selected image under the '''Tags''' tab on each site.   
There are three types of RAPIDS Docker images: ''base'', ''runtime'', and ''devel''. For each type, multiple images are provided for different combinations of RAPIDS and CUDA versions, either on Ubuntu or on CentOS. You can find the Docker <tt>pull</tt> command for a selected image under the '''Tags''' tab on each site.   
Line 23: Line 23:
<!--T:7-->
<!--T:7-->
* [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]
* [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]
** '''base''' images contain a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler.
** '''base''' images contain a RAPIDS environment ready for use. Use this type of image if you want to submit a job to the Slurm scheduler.
** '''runtime''' images extend the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.     
** '''runtime''' images extend the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.     
* [https://hub.docker.com/r/rapidsai/rapidsai-dev Docker Hub]
* [https://hub.docker.com/r/rapidsai/rapidsai-dev Docker Hub]
Line 44: Line 44:


=Working on clusters with a Singularity image= <!--T:12-->
=Working on clusters with a Singularity image= <!--T:12-->
Once you have a Singularity image for RAPIDS ready in your account, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
Once you have a Singularity image for RAPIDS ready in your account, you can request an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
==Exploring the contents in RAPIDS==
==Exploring the contents in RAPIDS==


<!--T:13-->
<!--T:13-->
To explore the contents without doing any computations, you can use the following commands to access the container shell of the Singularity image (here''rapids.sif'') on any node without requesting a GPU.
To explore the contents without doing any computations, you can use the following commands to access the container shell of the Singularity image (here ''rapids.sif'') on any node without requesting a GPU.


   
   

Revision as of 17:47, 27 April 2021

Other languages:

Overview

RAPIDS is a suite of open source software libraries from NVIDIA mainly for executing data science and analytics pipelines in Python on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization and provides users with friendly Python APIs, similar to those in Pandas, Scikit-learn, etc.

While RAPIDS can be installed using Anaconda, we do not recommend the use of Anaconda on Compute Canada clusters. We propose instead that you obtain a Docker image from NVIDIA, which can then be converted into a Singularity image for use on our clusters.

This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container.

Building a Singularity image

To build a Singularity image for RAPIDS, the first thing to do is to find and select a Docker image provided by NVIDIA.

Finding a Docker image

There are three types of RAPIDS Docker images: base, runtime, and devel. For each type, multiple images are provided for different combinations of RAPIDS and CUDA versions, either on Ubuntu or on CentOS. You can find the Docker pull command for a selected image under the Tags tab on each site.

  • NVIDIA GPU Cloud (NGC)
    • base images contain a RAPIDS environment ready for use. Use this type of image if you want to submit a job to the Slurm scheduler.
    • runtime images extend the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.
  • Docker Hub
    • devel images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes.

Building a RAPIDS Singularity image

For example, if a Docker pull command for a selected image is given as

docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7

on a computer that supports Singularity, you can build a Singularity image (here rapids.sif) with the following command based on the pull tag:

[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7

It usually takes from thirty to sixty minutes to complete the image-building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image.

Working on clusters with a Singularity image

Once you have a Singularity image for RAPIDS ready in your account, you can request an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.

Exploring the contents in RAPIDS

To explore the contents without doing any computations, you can use the following commands to access the container shell of the Singularity image (here rapids.sif) on any node without requesting a GPU.


Load the Singularity module first with

[name@server ~]$ module load singularity


Then access the container shell with

[name@server ~]$ singularity shell rapids.sif

The shell prompt is then changed to

Singularity>

Inside the Singularity shell, initiate Conda and activate the RAPIDS environment with

Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids

The shell prompt in the RAPIDS environment is then changed to

(rapids) Singularity>

Then you can list available packages in the RAPIDS environment with

(rapids) Singularity> conda list

To deactivate the RAPIDS environment and exit from the container, run

(rapids) Singularity> conda deactivate
Singularity> exit

You are then back to the host shell.

Working interactively on a GPU node

If a Singularity image was built based on a runtime or a devel type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with a GPU.

To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run

[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser

Once the requested resource is granted, start the RAPIDS shell on the GPU node with

[name@gra#### ~]$ module load singularity
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch  rapids.sif
  • the --nv option binds the GPU driver on the host to the container, so the GPU device can be accessed from inside the Singularity container;
  • the -B option binds any filesystem that you would like to access from inside the container.

After the shell prompt changes to Singularity>, you can check the GPU stats in the container to make sure the GPU device is accessible with

Singularity> nvidia-smi

Then to initiate Conda and activate the RAPIDS environment, run

Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids

After the shell prompt changes to (rapids) Singularity>, you can launch the Jupyter Notebook server in the RAPIDS environment with the following command, and the URL of the Notebook server will be displayed after it starts successfully.

(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser
[I 22:28:20.215 LabApp] JupyterLab extension loaded from /opt/conda/envs/rapids/lib/python3.7/site-packages/jupyterlab
[I 22:28:20.215 LabApp] JupyterLab application directory is /opt/conda/envs/rapids/share/jupyter/lab
[I 22:28:20.221 LabApp] Serving notebooks from local directory: /scratch/jhqin/RAPIDS_Demo
[I 22:28:20.221 LabApp] Jupyter Notebook 6.1.3 is running at:
[I 22:28:20.221 LabApp] http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.221 LabApp]  or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.222 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:28:20.244 LabApp]

    To access the notebook, open this file in a browser
        file:///home/jhqin/.local/share/jupyter/runtime/nbserver-76967-open.html
    Or copy and paste one of these URLs
        http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
     or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4

Where the URL for the notebook server in above example is

http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4

As there is no direct Internet connection on a compute node on Graham, you would need to set up an SSH tunnel with port forwarding between your local computer and the GPU node. See detailed instructions for connecting to Jupyter Notebook.

Submitting a RAPIDS job to the Slurm scheduler

Once you have your RAPIDS code ready and want to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script.

Here is an example of a job submission script (heresubmit.sh):

File : submit.sh

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --gres=gpu:t4:1
#SBATCH --cpus-per-task=2
#SBATCH --mem=10G
#SBATCH --time=dd:hh:mm
#SBATCH --account=def-someuser

module load singularity

singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh


Here is an example of job execution script (here run_script.sh) which you want to run in the container to start the execution of the Python code programed with RAPIDS:

File : run_script.sh

#!/bin/bash
source /opt/conda/etc/profile.d/conda.sh
conda activate rapids
nvidia-smi 

python /path/to/my_rapids_code.py


Helpful links

  • RAPIDS Docs: a collection of all the documentation for RAPIDS, how to stay connected and report issues;
  • RAPIDS Notebooks: a collection of example notebooks on GitHub for getting started quickly;
  • RAPIDS on Medium: a collection of use cases and blogs for RAPIDS applications.