RAPIDS: Difference between revisions
Line 123: | Line 123: | ||
==Submit a RAPIDS job to Slurm scheduler== | ==Submit a RAPIDS job to Slurm scheduler== | ||
Once you have your RAPIDS code ready and would like to submit a job exaction request to the Slurm scheduler, you need to prepare two script files, i.e. one is a job submission script and the other is a job execution script. | |||
Here is an example of a job submission script, e.g. ''submit.sh'': | Here is an example of a job submission script, e.g. ''submit.sh'': |
Revision as of 03:54, 19 December 2020
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Overview
RAPIDS is a suite of open source software libraries from NVIDIA, mainly for executing data science and analytics pipelines on GPUs. It relies on NVIDIA CUDA primitives for low level compute optimization and provides users with friendly Python APIs, similar to those in Pandas, Scikit-learn, etc.
Since RAPIDS is available as Conda packages which require having Anaconda for the installation, however Anaconda is not advised to use on the Compute Canada clusters. Instead, a container solution of using Singularity is recommended. As RAPIDS is also available as Docker container images from NVIDIA, and a Singularity image for RAPIDS can be built based from a Docker image.
This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container.
Build a Singularity image for RAPIDS
To build a Singularity image for RAPIDS the first thing to do is to find a Docker image for RAPIDS.
Where to look for a Docker image for RAPIDS
There are three types of RAPIDS Docker images, i.e. base, runtime, and devel types, and they are available at two major sites. For each type of Docker images, multiple images are provided with different combinations of RAPIDS versions and CUDA versions either in Ubuntu base or in CentOS base. You can find the Docker pull command of a selected image via the Tag tab on each given site:
- NVIDIA GPU Cloud (NGC): this site provides two types of RAPIDS images, i.e. base type and runtime type.
- base - contains a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler.
- runtime - extends the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.
- Docker Hub: this site provides RAPIDS images in devel type.
- devel - contains the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement any customized operations with low-level access to cuda-based processes.
Build a RAPIDS Singularity image
For example, if a docker pull command for a selected image is given as:
docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7
On a computer that has Singularity supported, you can build a Singularity image, e.g. called rapids.sif, with following command based on the given pull tag:
[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7
It usually takes half to one hour to complete the image building process. Since the image size is relatively large, you need to have enough memory and disk spaces on the server for building such an image.
Work on Clusters with a RAPIDS Singularity image
Once you have a Singularity image for RAPIDS located on Compute Canada clusters, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
Explore the contents in RAPIDS
If simply exploring the contents without doing any computations, you can use following commands to access the container shell of the Singularity image, e.g. called rapids.sif on any node without requesting any GPUs.
Load the Singularity module first:
[name@server ~]$ module load singularity
Then access the container shell:
[name@server ~]$ singularity shell rapids.sif
The shell prompt is then changed to:
Singularity>
Inside the singularity shell initiate Conda and activate RAPIDS environment:
Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids
The shell prompt in the rapids env is then changed to:
(rapids) Singularity>
Then you can list available packages in the rapids env:
(rapids) Singularity> conda list
To deactivate rapids env and exit from the container:
(rapids) Singularity> conda deactivate
Singularity> exit
You are then back to the host shell.
Work interactively on a GPU node
If a Singularity image was built based on a runtime or a devel type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with GPU.
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham:
[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser
Once the request is granted, start RAPIDS shell on the GPU node:
[name@gra#### ~]$ module load singularity
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch rapids.sif
Where --nv option is to bind mount the GPU driver on the host to the container, so the GPU device can be accessed from inside of the singularity container, and -B option is to bind mount any file system that you would like to access in the container.
After the shell prompt changes to Singularity>, you can check the GPU stats in the container to make sure the GPU device is accessible:
Singularity> nvidia-smi
Then to initiate Conda and activate rapids env:
Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids
After the shell prompt changes to (rapids) Singularity>, you can launch the Jupyter Notebook server in the rapids env with following commands:
(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser
[I 22:28:20.215 LabApp] JupyterLab extension loaded from /opt/conda/envs/rapids/lib/python3.7/site-packages/jupyterlab
[I 22:28:20.215 LabApp] JupyterLab application directory is /opt/conda/envs/rapids/share/jupyter/lab
[I 22:28:20.221 LabApp] Serving notebooks from local directory: /scratch/jhqin/RAPIDS_Demo
[I 22:28:20.221 LabApp] Jupyter Notebook 6.1.3 is running at:
[I 22:28:20.221 LabApp] http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.221 LabApp] or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.222 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:28:20.244 LabApp]
To access the notebook, open this file in a browser:
file:///home/jhqin/.local/share/jupyter/runtime/nbserver-76967-open.html
Or copy and paste one of these URLs:
http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
Where the URL for the notebook server in above example is:
http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
As there is no direct internet connection on a compute node on Graham, you would need to setup an SSH tunnel with port forwarding between your local computer and the GPU node. See detailed instructions for connecting to Jupyter Notebook.
Submit a RAPIDS job to Slurm scheduler
Once you have your RAPIDS code ready and would like to submit a job exaction request to the Slurm scheduler, you need to prepare two script files, i.e. one is a job submission script and the other is a job execution script.
Here is an example of a job submission script, e.g. submit.sh:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --gres=gpu:t4:1
#SBATCH --cpus-per-task=2
#SBATCH --mem=10G
#SBATCH --time=dd:hh:mm
#SBATCH --account=def-someuser
module load singularity
singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh
Where --nv is to bind mount the GPU driver on the host to the container, so the GPU device can be accessed from inside the singularity container.
Here is an example of a job execution script, e.g. run_script.sh, which you would like to run in the container to start the execution of the python code that has been programed with RAPIDS:
#!/bin/bash
source /opt/conda/etc/profile.d/conda.sh
conda activate rapids
nvidia-smi
python /path/to/my_rapids_code.py
Helpful Links
- RAPIDS Docs: a collection of all the documentation for RAPIDS, how to stay connected and report issues.
- RAPIDS Notebooks: a collection of example notebooks on GitHub for getting started quickly.
- RAPIDS AI on Medium: a collection of use cases and blogs for RAPIDS applications.