cc_staff
123
edits
No edit summary |
(Marked this version for translation) |
||
Line 2: | Line 2: | ||
<translate> | <translate> | ||
=Overview= | =Overview= <!--T:1--> | ||
<!--T:2--> | |||
[https://rapids.ai/ RAPIDS] is a suite of open source software libraries from NVIDIA mainly for executing data science and analytics pipelines in Python on GPUs. It relies on NVIDIA CUDA primitives for low level compute optimization and provides users with friendly Python APIs, similar to those in Pandas, Scikit-learn, etc. | [https://rapids.ai/ RAPIDS] is a suite of open source software libraries from NVIDIA mainly for executing data science and analytics pipelines in Python on GPUs. It relies on NVIDIA CUDA primitives for low level compute optimization and provides users with friendly Python APIs, similar to those in Pandas, Scikit-learn, etc. | ||
<!--T:3--> | |||
Since RAPIDS is available as Conda packages which require having [[Anaconda/en|Anaconda]] for the installation, however Anaconda is not advised to use on the Compute Canada clusters. Instead, a container solution of using [[Singularity|Singularity]] is recommended. As RAPIDS is also available as Docker container images from NVIDIA, a Singularity image for RAPIDS can be built based from a Docker image. | Since RAPIDS is available as Conda packages which require having [[Anaconda/en|Anaconda]] for the installation, however Anaconda is not advised to use on the Compute Canada clusters. Instead, a container solution of using [[Singularity|Singularity]] is recommended. As RAPIDS is also available as Docker container images from NVIDIA, a Singularity image for RAPIDS can be built based from a Docker image. | ||
<!--T:4--> | |||
This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container. | This page provides the instructions for working with RAPIDS on Compute Canada clusters based from a Singularity container. | ||
=Build a Singularity image for RAPIDS= | =Build a Singularity image for RAPIDS= <!--T:5--> | ||
To build a Singularity image for RAPIDS the first thing to do is to find and select a Docker image for RAPIDS from NVIDIA. | To build a Singularity image for RAPIDS the first thing to do is to find and select a Docker image for RAPIDS from NVIDIA. | ||
==Find a Docker image for RAPIDS== | ==Find a Docker image for RAPIDS== <!--T:6--> | ||
There are three types of RAPIDS Docker images, i.e. ''base'', ''runtime'', and ''devel'' types, and they are available at two major sites. For each type of Docker images, multiple images are provided with different combinations of RAPIDS versions and CUDA versions either on Ubuntu or on CentOS. You can find the Docker pull command of a selected image via the '''Tags''' tab on each given site: | There are three types of RAPIDS Docker images, i.e. ''base'', ''runtime'', and ''devel'' types, and they are available at two major sites. For each type of Docker images, multiple images are provided with different combinations of RAPIDS versions and CUDA versions either on Ubuntu or on CentOS. You can find the Docker pull command of a selected image via the '''Tags''' tab on each given site: | ||
<!--T:7--> | |||
* [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]: this site provides two types of RAPIDS images, i.e. ''base'' type and ''runtime'' type. | * [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]: this site provides two types of RAPIDS images, i.e. ''base'' type and ''runtime'' type. | ||
** ''base'' - contains a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler. | ** ''base'' - contains a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler. | ||
Line 24: | Line 28: | ||
** ''devel'' - contains the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement any customized operations with low-level access to cuda-based processes. | ** ''devel'' - contains the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement any customized operations with low-level access to cuda-based processes. | ||
==Build a RAPIDS Singularity image== | ==Build a RAPIDS Singularity image== <!--T:8--> | ||
<!--T:9--> | |||
For example, if a docker pull command for a selected image is given as: | For example, if a docker pull command for a selected image is given as: | ||
<!--T:10--> | |||
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | <source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | ||
Line 34: | Line 40: | ||
<source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | <source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | ||
<!--T:11--> | |||
It usually takes half to one hour to complete the image building process. Since the image size is relatively large, you need to have enough memory and disk spaces on the server for building such an image. | It usually takes half to one hour to complete the image building process. Since the image size is relatively large, you need to have enough memory and disk spaces on the server for building such an image. | ||
=Work on Clusters with a RAPIDS Singularity image= | =Work on Clusters with a RAPIDS Singularity image= <!--T:12--> | ||
Once you have a Singularity image for RAPIDS located on Compute Canada clusters, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready. | Once you have a Singularity image for RAPIDS located on Compute Canada clusters, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready. | ||
==Explore the contents in RAPIDS== | ==Explore the contents in RAPIDS== | ||
<!--T:13--> | |||
If simply exploring the contents without doing any computations, you can use following commands to access the container shell of the Singularity image, e.g. called ''rapids.sif'' on any node without requesting any GPUs. | If simply exploring the contents without doing any computations, you can use following commands to access the container shell of the Singularity image, e.g. called ''rapids.sif'' on any node without requesting any GPUs. | ||
<!--T:14--> | |||
Load the Singularity module first: | Load the Singularity module first: | ||
<source lang="console">[name@server ~]$ module load singularity</source> | <source lang="console">[name@server ~]$ module load singularity</source> | ||
<!--T:15--> | |||
Then access the container shell: | Then access the container shell: | ||
<source lang="console">[name@server ~]$ singularity shell rapids.sif</source> | <source lang="console">[name@server ~]$ singularity shell rapids.sif</source> | ||
<!--T:16--> | |||
The shell prompt is then changed to: | The shell prompt is then changed to: | ||
Line 55: | Line 66: | ||
</source> | </source> | ||
<!--T:17--> | |||
Inside the singularity shell initiate Conda and activate RAPIDS environment: | Inside the singularity shell initiate Conda and activate RAPIDS environment: | ||
Line 61: | Line 73: | ||
</source> | </source> | ||
<!--T:18--> | |||
The shell prompt in the rapids env is then changed to: | The shell prompt in the rapids env is then changed to: | ||
Line 66: | Line 79: | ||
</source> | </source> | ||
<!--T:19--> | |||
Then you can list available packages in the rapids env: | Then you can list available packages in the rapids env: | ||
Line 71: | Line 85: | ||
</source> | </source> | ||
<!--T:20--> | |||
To deactivate rapids env and exit from the container: | To deactivate rapids env and exit from the container: | ||
Line 77: | Line 92: | ||
</source> | </source> | ||
<!--T:21--> | |||
You are then back to the host shell. | You are then back to the host shell. | ||
==Work interactively on a GPU node== | ==Work interactively on a GPU node== <!--T:22--> | ||
<!--T:23--> | |||
If a Singularity image was built based on a ''runtime'' or a ''devel'' type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with GPU. | If a Singularity image was built based on a ''runtime'' or a ''devel'' type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with GPU. | ||
<!--T:24--> | |||
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham: | To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham: | ||
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source> | <source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source> | ||
<!--T:25--> | |||
Once the requested resource is granted, start RAPIDS shell on the GPU node: | Once the requested resource is granted, start RAPIDS shell on the GPU node: | ||
<!--T:26--> | |||
<source lang="console">[name@gra#### ~]$ module load singularity | <source lang="console">[name@gra#### ~]$ module load singularity | ||
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch rapids.sif | [name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch rapids.sif | ||
Line 93: | Line 113: | ||
Where '''--nv''' option is to bind mount the GPU driver on the host to the container, so the GPU device can be accessed from inside of the singularity container, and '''-B''' option is to bind mount any file system that you would like to access in the container. | Where '''--nv''' option is to bind mount the GPU driver on the host to the container, so the GPU device can be accessed from inside of the singularity container, and '''-B''' option is to bind mount any file system that you would like to access in the container. | ||
<!--T:27--> | |||
After the shell prompt changes to '''Singularity>''', you can check the GPU stats in the container to make sure the GPU device is accessible: | After the shell prompt changes to '''Singularity>''', you can check the GPU stats in the container to make sure the GPU device is accessible: | ||
<source lang="console">Singularity> nvidia-smi</source> | <source lang="console">Singularity> nvidia-smi</source> | ||
<!--T:28--> | |||
Then to initiate Conda and activate rapids env: | Then to initiate Conda and activate rapids env: | ||
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh | <source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh | ||
Line 101: | Line 123: | ||
</source> | </source> | ||
<!--T:29--> | |||
After the shell prompt changes to '''(rapids) Singularity>''', you can launch the Jupyter Notebook server in the rapids env with following command, and the URL of the Notebook server is displayed after it starts successfully:: | After the shell prompt changes to '''(rapids) Singularity>''', you can launch the Jupyter Notebook server in the rapids env with following command, and the URL of the Notebook server is displayed after it starts successfully:: | ||
<source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser | <source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser | ||
Line 112: | Line 135: | ||
[C 22:28:20.244 LabApp] | [C 22:28:20.244 LabApp] | ||
To access the notebook, open this file in a browser: | <!--T:30--> | ||
To access the notebook, open this file in a browser: | |||
file:///home/jhqin/.local/share/jupyter/runtime/nbserver-76967-open.html | file:///home/jhqin/.local/share/jupyter/runtime/nbserver-76967-open.html | ||
Or copy and paste one of these URLs: | Or copy and paste one of these URLs: | ||
Line 122: | Line 146: | ||
As there is no direct Internet connection on a compute node on Graham, you would need to setup an SSH tunnel with port forwarding between your local computer and the GPU node. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]]. | As there is no direct Internet connection on a compute node on Graham, you would need to setup an SSH tunnel with port forwarding between your local computer and the GPU node. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]]. | ||
==Submit a RAPIDS job to Slurm scheduler== | ==Submit a RAPIDS job to Slurm scheduler== <!--T:31--> | ||
Once you have your RAPIDS code ready and would like to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script. | Once you have your RAPIDS code ready and would like to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script. | ||
<!--T:32--> | |||
Here is an example of a job submission script, e.g. ''submit.sh'': | Here is an example of a job submission script, e.g. ''submit.sh'': | ||
{{File | {{File | ||
Line 138: | Line 163: | ||
#SBATCH --account=def-someuser | #SBATCH --account=def-someuser | ||
<!--T:33--> | |||
module load singularity | module load singularity | ||
<!--T:34--> | |||
singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh | singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh | ||
}} | }} | ||
Line 153: | Line 180: | ||
nvidia-smi | nvidia-smi | ||
<!--T:35--> | |||
python /path/to/my_rapids_code.py | python /path/to/my_rapids_code.py | ||
}} | }} | ||
=Helpful Links= | =Helpful Links= <!--T:36--> | ||
<!--T:37--> | |||
* [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues. | * [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues. | ||
* [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly. | * [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly. |