RAPIDS: Difference between revisions

1,404 bytes removed ,  3 years ago
no edit summary
mNo edit summary
No edit summary
Line 5: Line 5:


<!--T:2-->
<!--T:2-->
[https://rapids.ai/ RAPIDS] is a suite of open source software libraries from NVIDIA mainly for executing data science and analytics pipelines in Python on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization and provides users with friendly Python APIs, similar to those in Pandas, Scikit-learn, etc.
[https://rapids.ai/ RAPIDS] is a suite of open source software libraries from NVIDIA mainly for executing data science and analytics pipelines in Python on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization and provides friendly Python APIs, similar to those in Pandas, Scikit-learn, etc.


<!--T:3-->
<!--T:3-->
While RAPIDS can be installed using [[Anaconda/en|Anaconda]], we do not recommend the use of Anaconda on Compute Canada clusters. We propose instead that you obtain a Docker image from NVIDIA, which can then be converted into a [[Singularity|Singularity]] image for use on our clusters.  
The main components are:
* '''cuDF''', a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data.
 
* '''cuML''', a suite of libraries that implement machine learning algorithms and mathematical primitive functions that share compatible APIs with other RAPIDS projects.
 
* '''cuGraph''', a GPU accelerated graph analytics library, with functionality like NetworkX, which is seamlessly integrated into the RAPIDS data science platform.
 
* '''Cyber Log Accelerators (CLX or ''clicks'')''', a collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
 
* '''cuxFilter''', a connector library, which provides the connections between different visualization libraries and a GPU dataframe without much hassle. This also allows you to use charts from different libraries in a single dashboard, while also providing the interaction.
 
* '''cuSpatial''', a GPU accelerated C++/Python library for accelerating GIS workflows including point-in-polygon, spatial join, coordinate systems, shape primitives, distances, and trajectory analysis.
 
* '''cuSignal''', which leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA kernels for additional speedups for selected functions.
 
* '''cuCIM''', an extensible toolkit designed to provide GPU accelerated I/O, computer vision & image processing primitives for N-Dimensional images with a focus on biomedical imaging.
 
* '''RAPIDS Memory Manager (RMM)''', a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous.  


<!--T:4-->
<!--T:4-->
This page provides the instructions for working with RAPIDS on Compute Canada clusters via a Singularity container.
= Singularity images=


=Building a Singularity image= <!--T:5-->
<!--T:5-->
To build a Singularity image for RAPIDS, the first thing to do is to find and select a Docker image provided by NVIDIA.


To build a Singularity image for RAPIDS, the first thing to do is to find and select a Docker image provided by NVIDIA.
<!--T:6-->
==Finding a Docker image==
==Finding a Docker image== <!--T:6-->
   
   
There are three types of RAPIDS Docker images: ''base'', ''runtime'', and ''devel''. For each type, multiple images are provided for different combinations of RAPIDS and CUDA versions, either on Ubuntu or on CentOS. You can find the Docker <tt>pull</tt> command for a selected image under the '''Tags''' tab on each site.   
There are three types of RAPIDS Docker images: ''base'', ''runtime'', and ''devel''. For each type, multiple images are provided for different combinations of RAPIDS and CUDA versions, either on Ubuntu or on CentOS. You can find the Docker <tt>pull</tt> command for a selected image under the '''Tags''' tab on each site.   
Line 28: Line 45:
** '''devel''' images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes.
** '''devel''' images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes.


==Building a RAPIDS Singularity image== <!--T:8-->
==Building a Singularity image== <!--T:8-->


<!--T:9-->
<!--T:9-->
Line 36: Line 53:
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>  
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>  
   
   
on a computer that supports Singularity, you can build a Singularity image (here ''rapids.sif'') with the following command based on the pull tag:  
on a computer that supports Singularity, you can build a Singularity image (here ''rapids.sif'') with the following command based on the <tt>pull</tt> tag:  
   
   
<source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>
<source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>
Line 43: Line 60:
It usually takes from thirty to sixty minutes to complete the image-building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image.
It usually takes from thirty to sixty minutes to complete the image-building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image.


=Working on clusters with a Singularity image= <!--T:12-->
<!--T:12-->
Once you have a Singularity image for RAPIDS ready in your account, you can request an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
=Working on clusters with a Singularity image=
==Exploring the contents in RAPIDS==
Once you have a Singularity image for RAPIDS ready in your account, you can request an interactive session on a GPU node or submit a batch job to Slurm if you have your RAPIDS code ready.


<!--T:13-->
<!--T:13-->
To explore the contents without doing any computations, you can use the following commands to access the container shell of the Singularity image (here ''rapids.sif'') on any node without requesting a GPU.
==Working interactively on a GPU node==


<!--T:14-->
<!--T:14-->
Load the Singularity module first with
If a Singularity image was built based on a runtime or a devel type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with a GPU.<br>
<source lang="console">[name@server ~]$ module load singularity</source>
 
<!--T:15-->
Then access the container shell with
<source lang="console">[name@server ~]$ singularity shell rapids.sif</source>
 
<!--T:16-->
The shell prompt is then changed to
<source lang="console">Singularity>
</source>
 
<!--T:17-->
Inside the Singularity shell, initiate Conda and activate the RAPIDS environment with
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids
</source>
 
<!--T:18-->
The shell prompt in the RAPIDS environment is then changed to
<source lang="console">(rapids) Singularity>
</source>
 
<!--T:19-->
Then you can list available packages in the RAPIDS environment with
<source lang="console">(rapids) Singularity> conda list
</source>
 
<!--T:20-->
To deactivate the RAPIDS environment and exit from the container, run
<source lang="console">(rapids) Singularity> conda deactivate
Singularity> exit 
</source>
 
<!--T:21-->
You are then back to the host shell.
 
==Working interactively on a GPU node== <!--T:22-->
 
<!--T:23-->
If a Singularity image was built based on a ''runtime'' or a ''devel'' type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with a GPU.
 
<!--T:24-->
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source>
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source>


<!--T:25-->
<!--T:15-->
Once the requested resource is granted, start the RAPIDS shell on the GPU node with
Once the requested resource is granted, start the RAPIDS shell on the GPU node with


<!--T:26-->
<!--T:16-->
<source lang="console">[name@gra#### ~]$ module load singularity
<source lang="console">[name@gra#### ~]$ module load singularity
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch  rapids.sif
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch  rapids.sif
Line 114: Line 82:
* the <tt>-B</tt> option binds any filesystem that you would like to access from inside the container.
* the <tt>-B</tt> option binds any filesystem that you would like to access from inside the container.


<!--T:27-->
<!--T:17-->
After the shell prompt changes to <tt>Singularity></tt>, you can check the GPU stats in the container to make sure the GPU device is accessible with
After the shell prompt changes to <tt>Singularity></tt>, you can check the GPU stats in the container to make sure the GPU device is accessible with
<source lang="console">Singularity> nvidia-smi</source>
<source lang="console">Singularity> nvidia-smi</source>


<!--T:28-->
<!--T:18-->
Then to initiate Conda and activate the RAPIDS environment, run
Then to initiate Conda and activate the RAPIDS environment, run
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
Line 124: Line 92:
</source>
</source>


<!--T:29-->
<!--T:19-->
After the shell prompt changes to <tt>(rapids) Singularity></tt>, you can launch the Jupyter Notebook server in the RAPIDS environment with the following command, and the URL of the Notebook server will be displayed after it starts successfully. An example output would look like following:
After the shell prompt changes to <tt>(rapids) Singularity></tt>, you can launch the Jupyter Notebook server in the RAPIDS environment with the following command, and the URL of the Notebook server will be displayed after it starts successfully.  
<source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser
<source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser  
[I 22:28:20.215 LabApp] JupyterLab extension loaded from /opt/conda/envs/rapids/lib/python3.7/site-packages/jupyterlab
</source>
[I 22:28:20.215 LabApp] JupyterLab application directory is /opt/conda/envs/rapids/share/jupyter/lab
 
[I 22:28:20.221 LabApp] Serving notebooks from local directory: /scratch/jhqin/RAPIDS_Demo
<!--T:20-->
[I 22:28:20.221 LabApp] Jupyter Notebook 6.1.3 is running at:
As there is no direct Internet connection on a compute node on Graham, you would need to set up an SSH tunnel with port forwarding between your local computer and the GPU node. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]].
[I 22:28:20.221 LabApp] http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.221 LabApp] or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
[I 22:28:20.222 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:28:20.244 LabApp]


    <!--T:30-->
<!--T:21-->
To access the notebook, open this file in a browser
==Submitting a RAPIDS job to the Slurm scheduler==  
        file:///home/jhqin/.local/share/jupyter/runtime/nbserver-76967-open.html
    Or copy and paste one of these URLs
        http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
    or http://127.0.0.1:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4
</source>
Where the URL for the notebook server is
<source lang="console">http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4</source>
You may ignore the rest of other information displayed. As there is no direct Internet connection on a compute node on Graham, you would need to set up an SSH tunnel with port forwarding between your local computer and the GPU node before you are able to connect to the notebook server via a web browser on your local computer. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]].


==Submitting a RAPIDS job to the Slurm scheduler== <!--T:31-->
<!--T:22-->
Once you have your RAPIDS code ready and want to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script.
Once you have your RAPIDS code ready and want to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script.


<!--T:32-->
<!--T:23-->
Here is an example of a job submission script (here ''submit.sh''):
'''Submission script'''
{{File
{{File
   |name=submit.sh
   |name=submit.sh
Line 157: Line 113:
   |contents=
   |contents=
#!/bin/bash
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --gres=gpu:t4:1
#SBATCH --gres=gpu:t4:1
#SBATCH --cpus-per-task=2
#SBATCH --cpus-per-task=2
Line 163: Line 118:
#SBATCH --time=dd:hh:mm
#SBATCH --time=dd:hh:mm
#SBATCH --account=def-someuser
#SBATCH --account=def-someuser
<!--T:33-->
module load singularity
module load singularity
<!--T:34-->
singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh
singularity run --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh
}}
}}
 
Here is an example of job execution script (here ''run_script.sh'') which you want to run in the container to start the execution of the Python code programed with RAPIDS:
<!--T:24-->
'''Execution script'''
{{File
{{File
   |name=run_script.sh
   |name=run_script.sh
Line 180: Line 132:
conda activate rapids
conda activate rapids
nvidia-smi  
nvidia-smi  
<!--T:35-->
python /path/to/my_rapids_code.py  
python /path/to/my_rapids_code.py  
}}
}}


=Helpful links= <!--T:36-->
<!--T:25-->
=Helpful links=


<!--T:37-->
<!--T:26-->
* [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues;
* [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues;
* [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly;
* [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly;
rsnt_translations
56,430

edits