RAPIDS: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 23: Line 23:
<!--T:7-->
<!--T:7-->
* [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]
* [https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai NVIDIA GPU Cloud (NGC)]
** '''base''' contains a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler.
** '''base''' images contain a RAPIDS environment ready to use. Use this type of image if you want to submit a job to the Slurm scheduler.
** '''runtime''' extends the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.     
** '''runtime''' images extend the base image by adding a Jupyter notebook server and example notebooks. Use this type of image if you want to interactively work with RAPIDS through notebooks and examples.     
* [https://hub.docker.com/r/rapidsai/rapidsai-dev Docker Hub]
* [https://hub.docker.com/r/rapidsai/rapidsai-dev Docker Hub]
** '''devel''' contains the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes.
** '''devel''' images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes.


==Building a RAPIDS Singularity image== <!--T:8-->
==Building a RAPIDS Singularity image== <!--T:8-->


<!--T:9-->
<!--T:9-->
For example, if a Docker pull command for a selected image is given as:
For example, if a Docker <tt>pull</tt> command for a selected image is given as  


<!--T:10-->
<!--T:10-->
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>  
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>  
   
   
On a computer that has Singularity supported, you can build a Singularity image, e.g. called ''rapids.sif'', with following command based on the given pull tag:  
on a computer that has supports Singularity, you can build a Singularity image (here ''rapids.sif'') with the following command based on the given pull tag:  
   
   
<source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>
<source lang="console">[name@server ~]$ singularity build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source>


<!--T:11-->
<!--T:11-->
It usually takes from thirty to sixty minutes to complete the image building process. Since the image size is relatively large, you need to have enough memory and disk spaces on the server for building such an image.
It usually takes from thirty to sixty minutes to complete the image building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image.


=Work on Clusters with a RAPIDS Singularity image= <!--T:12-->
=Working on clusters with a Singularity image= <!--T:12-->
Once you have a Singularity image for RAPIDS located on Compute Canada clusters, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
Once you have a Singularity image for RAPIDS ready in your account, you can work interactively by requesting an interactive session on a GPU node or submit a batch job to the Slurm queue if you have your RAPIDS code ready.
==Explore the contents in RAPIDS==
==Exploring the contents in RAPIDS==


<!--T:13-->
<!--T:13-->
If simply exploring the contents without doing any computations, you can use following commands to access the container shell of the Singularity image, e.g. called ''rapids.sif''  on any node without requesting any GPUs.
To explore the contents without doing any computations, you can use following commands to access the container shell of the Singularity image (here''rapids.sif'') on any node without requesting a GPU.


   
   
<!--T:14-->
<!--T:14-->
Load the Singularity module first:
Load the Singularity module first with
<source lang="console">[name@server ~]$ module load singularity</source>
<source lang="console">[name@server ~]$ module load singularity</source>


   
   
<!--T:15-->
<!--T:15-->
Then access the container shell:
Then access the container shell with
<source lang="console">[name@server ~]$ singularity shell rapids.sif</source>
<source lang="console">[name@server ~]$ singularity shell rapids.sif</source>


<!--T:16-->
<!--T:16-->
The shell prompt is then changed to:
The shell prompt is then changed to
   
   
<source lang="console">Singularity>  
<source lang="console">Singularity>  
Line 67: Line 67:


<!--T:17-->
<!--T:17-->
Inside the singularity shell initiate Conda and activate RAPIDS environment:
Inside the Singularity shell, initiate Conda and activate the RAPIDS environment with
   
   
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
Line 74: Line 74:


<!--T:18-->
<!--T:18-->
The shell prompt in the rapids env is then changed to:
The shell prompt in the rapids env is then changed to
   
   
<source lang="console">(rapids) Singularity>  
<source lang="console">(rapids) Singularity>  
Line 80: Line 80:


<!--T:19-->
<!--T:19-->
Then you can list available packages in the rapids env:
Then you can list available packages in the rapids env with
   
   
<source lang="console">(rapids) Singularity> conda list  
<source lang="console">(rapids) Singularity> conda list  
Line 86: Line 86:


<!--T:20-->
<!--T:20-->
To deactivate rapids env and exit from the container:
To deactivate rapids env and exit from the container, run
   
   
<source lang="console">(rapids) Singularity> conda deactivate
<source lang="console">(rapids) Singularity> conda deactivate
Line 98: Line 98:


<!--T:23-->
<!--T:23-->
If a Singularity image was built based on a ''runtime'' or a ''devel'' type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with GPU.
If a Singularity image was built based on a ''runtime'' or a ''devel'' type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with a GPU.


<!--T:24-->
<!--T:24-->
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham:
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source>
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source>


<!--T:25-->
<!--T:25-->
Once the requested resource is granted, start RAPIDS shell on the GPU node:
Once the requested resource is granted, start RAPIDS shell on the GPU node with


<!--T:26-->
<!--T:26-->
Line 111: Line 111:
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch  rapids.sif
[name@gra#### ~]$ singularity shell --nv -B /home -B /project -B /scratch  rapids.sif
</source>
</source>
Here <tt>--nv</tt> option is to bind the GPU driver on the host to the container, so the GPU device can be accessed from inside the singularity container and the <tt>-B</tt> option is to bind any file system that you would like to access from inside the container.
* the <tt>--nv</tt> option is to bind the GPU driver on the host to the container, so the GPU device can be accessed from inside the Singularity container;
* the <tt>-B</tt> option is to bind any filesystem that you would like to access from inside the container.


<!--T:27-->
<!--T:27-->
After the shell prompt changes to <tt>Singularity></tt>, you can check the GPU stats in the container to make sure the GPU device is accessible:
After the shell prompt changes to <tt>Singularity></tt>, you can check the GPU stats in the container to make sure the GPU device is accessible with
<source lang="console">Singularity> nvidia-smi</source>
<source lang="console">Singularity> nvidia-smi</source>


<!--T:28-->
<!--T:28-->
Then to initiate Conda and activate rapids env:
Then to initiate Conda and activate the rapids env, run
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
<source lang="console">Singularity> source /opt/conda/etc/profile.d/conda.sh
Singularity> conda activate rapids   
Singularity> conda activate rapids   
Line 124: Line 125:


<!--T:29-->
<!--T:29-->
After the shell prompt changes to <tt>(rapids) Singularity></tt>, you can launch the Jupyter Notebook server in the rapids env with following command, and the URL of the Notebook server is displayed after it starts successfully::
After the shell prompt changes to <tt>(rapids) Singularity></tt>, you can launch the Jupyter Notebook server in the rapids env with following command, and the URL of the Notebook server is displayed after it starts successfully:
<source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser
<source lang="console">(rapids) Singularity> jupyter-lab --ip $(hostname -f) --no-browser
[I 22:28:20.215 LabApp] JupyterLab extension loaded from /opt/conda/envs/rapids/lib/python3.7/site-packages/jupyterlab
[I 22:28:20.215 LabApp] JupyterLab extension loaded from /opt/conda/envs/rapids/lib/python3.7/site-packages/jupyterlab
Line 144: Line 145:
Where the URL for the notebook server in above example is:
Where the URL for the notebook server in above example is:
<source lang="console">http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4</source>
<source lang="console">http://gra1160.graham.sharcnet:8888/?token=5d4b75bf2ec3481fab1b625656a322afc96775440b7bb8c4</source>
As there is no direct Internet connection on a compute node on Graham, you would need to setup an SSH tunnel with port forwarding between your local computer and the GPU node. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]].
As there is no direct Internet connection on a compute node on Graham, you would need to set up an SSH tunnel with port forwarding between your local computer and the GPU node. See [[Jupyter#Connecting_to_Jupyter_Notebook|detailed instructions for connecting to Jupyter Notebook]].


==Submit a RAPIDS job to Slurm scheduler== <!--T:31-->
==Submitting a RAPIDS job to Slurm scheduler== <!--T:31-->
Once you have your RAPIDS code ready and would like to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script.
Once you have your RAPIDS code ready and would like to submit a job execution request to the Slurm scheduler, you need to prepare two script files, i.e. a job submission script and a job execution script.


Line 170: Line 171:
}}
}}
   
   
Here is an example of a job execution script, e.g. ''run_script.sh'', which you would like to run in the container to start the execution of the python code that has been programed with RAPIDS:  
Here is an example of job execution script ''run_script.sh'' which you would like to run in the container to start the execution of the Python code programmed with RAPIDS:  
{{File
{{File
   |name=run_script.sh
   |name=run_script.sh
Line 184: Line 185:
}}
}}


=Helpful Links= <!--T:36-->
=Helpful links= <!--T:36-->


<!--T:37-->
<!--T:37-->
* [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues.
* [https://docs.rapids.ai/ RAPIDS Docs]: a collection of all the documentation for RAPIDS, how to stay connected and report issues;
* [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly.
* [https://github.com/rapidsai/notebooks RAPIDS Notebooks]: a collection of example notebooks on GitHub for getting started quickly;
* [https://medium.com/rapids-ai RAPIDS on Medium]: a collection of use cases and blogs for RAPIDS applications.
* [https://medium.com/rapids-ai RAPIDS on Medium]: a collection of use cases and blogs for RAPIDS applications.


</translate>
</translate>
rsnt_translations
56,430

edits

Navigation menu