cc_staff
123
edits
m (Update the link to the page for connecting to the notebook server) |
m (replaced "Singularity" with "Apptainer") |
||
Line 35: | Line 35: | ||
* '''RAPIDS Memory Manager (RMM)''', a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous. | * '''RAPIDS Memory Manager (RMM)''', a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous. | ||
= | = Apptainer images= <!--T:4--> | ||
<!--T:5--> | <!--T:5--> | ||
To build | To build an Apptainer (formerly called [https://docs.alliancecan.ca/wiki/Singularity/en#Please_use_Apptainer_instead Singularity] ) image for RAPIDS, the first thing to do is to find and select a Docker image provided by NVIDIA. | ||
==Finding a Docker image== <!--T:6--> | ==Finding a Docker image== <!--T:6--> | ||
Line 51: | Line 51: | ||
** '''devel''' images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes. | ** '''devel''' images contain the full RAPIDS source tree, the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development. Use this type of image if you want to implement customized operations with low-level access to cuda-based processes. | ||
==Building | ==Building an Apptainer image== <!--T:8--> | ||
<!--T:9--> | <!--T:9--> | ||
Line 59: | Line 59: | ||
<source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | <source lang="console">docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | ||
on a computer that supports | on a computer that supports Apptainer, you can build an Apptainer image (here ''rapids.sif'') with the following command based on the <tt>pull</tt> tag: | ||
<source lang="console">[name@server ~]$ | <source lang="console">[name@server ~]$ apptainer build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-centos7</source> | ||
<!--T:11--> | <!--T:11--> | ||
It usually takes from thirty to sixty minutes to complete the image-building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image. | It usually takes from thirty to sixty minutes to complete the image-building process. Since the image size is relatively large, you need to have enough memory and disk space on the server to build such an image. | ||
=Working on clusters with | =Working on clusters with an Apptainer image= <!--T:12--> | ||
Once you have | Once you have an Apptainer image for RAPIDS ready in your account, you can request an interactive session on a GPU node or submit a batch job to Slurm if you have your RAPIDS code ready. | ||
==Working interactively on a GPU node== <!--T:13--> | ==Working interactively on a GPU node== <!--T:13--> | ||
<!--T:14--> | <!--T:14--> | ||
If | If an Apptainer image was built based on a runtime or a devel type of Docker image, it includes a Jupyter Notebook server and can be used to explore RAPIDS interactively on a compute node with a GPU.<br> | ||
To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run | To request an interactive session on a compute node with a single GPU, e.g. a T4 type of GPU on Graham, run | ||
<source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source> | <source lang="console">[name@gra-login ~]$ salloc --ntasks=1 --cpus-per-task=2 --mem=10G --gres=gpu:t4:1 --time=1:0:0 --account=def-someuser</source> | ||
Line 80: | Line 80: | ||
<!--T:16--> | <!--T:16--> | ||
<source lang="console">[name@gra#### ~]$ module load | <source lang="console">[name@gra#### ~]$ module load apptainer | ||
[name@gra#### ~]$ | [name@gra#### ~]$ apptainer shell --nv -B /home -B /project -B /scratch rapids.sif | ||
</source> | </source> | ||
* the <tt>--nv</tt> option binds the GPU driver on the host to the container, so the GPU device can be accessed from inside the | * the <tt>--nv</tt> option binds the GPU driver on the host to the container, so the GPU device can be accessed from inside the Apptainer container; | ||
* the <tt>-B</tt> option binds any filesystem that you would like to access from inside the container. | * the <tt>-B</tt> option binds any filesystem that you would like to access from inside the container. | ||
<!--T:17--> | <!--T:17--> | ||
After the shell prompt changes to <tt> | After the shell prompt changes to <tt>Apptainer></tt>, you can check the GPU stats in the container to make sure the GPU device is accessible with | ||
<source lang="console"> | <source lang="console">Apptainer> nvidia-smi</source> | ||
<!--T:18--> | <!--T:18--> | ||
Then to initiate Conda and activate the RAPIDS environment, run | Then to initiate Conda and activate the RAPIDS environment, run | ||
<source lang="console"> | <source lang="console">Apptainer> source /opt/conda/etc/profile.d/conda.sh | ||
Apptainer> conda activate rapids | |||
</source> | </source> | ||
<!--T:19--> | <!--T:19--> | ||
After the shell prompt changes to <tt>(rapids) | After the shell prompt changes to <tt>(rapids) Apptainer></tt>, you can launch the Jupyter Notebook server in the RAPIDS environment with the following command, and the URL of the Notebook server will be displayed after it starts successfully. | ||
<source lang="console">(rapids) | <source lang="console">(rapids) Apptainer> jupyter-lab --ip $(hostname -f) --no-browser | ||
</source> | </source> | ||
Line 121: | Line 121: | ||
#SBATCH --time=dd:hh:mm | #SBATCH --time=dd:hh:mm | ||
#SBATCH --account=def-someuser | #SBATCH --account=def-someuser | ||
module load | module load apptainer | ||
apptainer exec --nv -B /home -B /scratch rapids.sif /path/to/run_script.sh | |||
}} | }} | ||