Advanced Jupyter configuration: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Needed more jupyter commands for Lmod)
(Marked this version for translation)
 
(43 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{draft}}
<languages />
<translate>


= Introduction =
= Introduction = <!--T:1-->


* '''Project Jupyter''': "a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages."<ref>https://jupyter.org/about.html</ref>
<!--T:90-->
* '''JupyterLab''': "a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design allows for extensions that expand and enrich functionality."<ref>https://jupyter.org/</ref>
{{Warning
|title=Running notebooks
|content=Jupyter Lab and notebooks are meant for '''short''' interactive tasks such as testing, debugging or quickly visualize data (few minutes). Running longer analysis must be done in an [[Running_jobs#Use_sbatch_to_submit_jobs|non-interactive job (sbatch)]].
See also [[Advanced_Jupyter_configuration#Running_notebooks_as_Python_scripts|how to run notebooks as python scripts below]].
}}
 
<!--T:2-->
* <b>Project Jupyter</b>: "a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages."<ref>https://jupyter.org/about.html</ref>
* <b>JupyterLab</b>: "a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design allows for extensions that expand and enrich functionality."<ref>https://jupyter.org/</ref>


A JupyterLab server should only run on a compute node or on a cloud instance; cluster login nodes are not a good choice because they impose various limits which can stop applications if they consume too much CPU time or memory. In the case of using a compute node, users can reserve compute resources by [[Running_jobs|submitting a job]] that requests a specific number of CPUs (and optionally GPUs), an amount of memory and the run time. '''In this page, we give detailed instructions on how to configure and submit a JupyterLab job on any national cluster.'''
<!--T:3-->
A JupyterLab server should only run on a compute node or on a cloud instance; cluster login nodes are not a good choice because they impose various limits which can stop applications if they consume too much CPU time or memory. In the case of using a compute node, users can reserve compute resources by [[Running_jobs|submitting a job]] that requests a specific number of CPUs (and optionally GPUs), an amount of memory and the run time. <b>In this page, we give detailed instructions on how to configure and submit a JupyterLab job on any national cluster.</b>


But, what about ...
<!--T:4-->
* ''Jupyter Notebook''? JupyterLab is a more modern and flexible interface than the ''classic'' '''[[JupyterNotebook|Jupyter Notebook]]'''
If you are instead looking for a preconfigured Jupyter environment, please check the [[Jupyter]] page.
* A preconfigured JupyterLab service? '''[[JupyterHub#Compute_Canada_initiatives|Some regional partners]]''' provide a web portal named ''JupyterHub''. For instance, many of these portals offer a '''[[JupyterHub#JupyterLab|preconfigured JupyterLab]]''' service so that users do not have to create their own setup. To learn more, visit the '''[[JupyterHub]]''' wiki page


= Installing JupyterLab =
= Installing JupyterLab = <!--T:5-->


<!--T:6-->
These instructions install JupyterLab with the <code>pip</code> command in a
These instructions install JupyterLab with the <code>pip</code> command in a
[[Python#Creating_and_using_a_virtual_environment|Python virtual environment]]:
[[Python#Creating_and_using_a_virtual_environment|Python virtual environment]]:


<!--T:7-->
<ol>
<li>If you do not have an existing Python virtual environment, create one. Then, activate it:
<ol>
<ol>
<li>Load a Python module, either the default one (as shown below) or
<li>Load a Python module, either the default one (as shown below) or
a specific version (see available versions with <code>module avail python</code>):
a specific version (see available versions with <code>module avail python</code>):
{{Command2
</translate>{{Command2
|module load python
|module load python
}}
}}
</li>
<translate>
<!--T:96-->
<b>If you intend to use RStudio Server</b>, make sure to load <code>rstudio-server</code> first:
</translate>{{Command2
|module load rstudio-server python
}}
</li><translate>
<!--T:8-->
<li>Create a new Python virtual environment:
<li>Create a new Python virtual environment:
{{Command2
</translate>{{Command2
|virtualenv --no-download $HOME/jupyter_py3
|virtualenv --no-download $HOME/jupyter_py3
}}
}}
</li>
</li><translate>
<!--T:9-->
<li>Activate your newly created Python virtual environment:
<li>Activate your newly created Python virtual environment:
{{Command2
</translate>{{Command2
|source $HOME/jupyter_py3/bin/activate
|source $HOME/jupyter_py3/bin/activate
}}
}}
</li>
</li>
</ol>
</li><translate>
<!--T:10-->
<li>Install JupyterLab in your new virtual environment (note: it takes a few minutes):
<li>Install JupyterLab in your new virtual environment (note: it takes a few minutes):
{{Command2
</translate>{{Commands2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|pip install --no-index --upgrade pip
|pip install --no-index jupyterlab
|pip install --no-index jupyterlab
}}
}}
</li>
</li><translate>
<!--T:11-->
<li>In the virtual environment, create a wrapper script that launches JupyterLab:
<li>In the virtual environment, create a wrapper script that launches JupyterLab:
{{Command2
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|echo -e '#!/bin/bash\nunset XDG_RUNTIME_DIR\njupyter-lab --ip $(hostname -f) --no-browser' > $VIRTUAL_ENV/bin/jupyterlab.sh
|echo -e '#!/bin/bash\nunset XDG_RUNTIME_DIR\njupyter lab --ip $(hostname -f) --no-browser' > $VIRTUAL_ENV/bin/jupyterlab.sh
}}
}}
</li>
</li><translate>
<!--T:12-->
<li>Finally, make the script executable:
<li>Finally, make the script executable:
{{Command2
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|chmod u+x $VIRTUAL_ENV/bin/jupyterlab.sh
|chmod u+x $VIRTUAL_ENV/bin/jupyterlab.sh
}}
}}
</li>
</li>
</ol>
</ol><translate>


= Installing extensions =
= Installing extensions = <!--T:13-->


<!--T:14-->
Extensions allow you to add functionalities and modify the JupyterLab’s user interface.  
Extensions allow you to add functionalities and modify the JupyterLab’s user interface.  


=== Jupyter Lmod ===
=== Jupyter Lmod === <!--T:15-->


<!--T:16-->
[https://github.com/cmd-ntrf/jupyter-lmod Jupyter Lmod] is an extension that allows you to interact with environment modules before launching kernels. The extension uses the Lmod's Python interface to accomplish module-related tasks like loading, unloading, saving a collection, etc.
[https://github.com/cmd-ntrf/jupyter-lmod Jupyter Lmod] is an extension that allows you to interact with environment modules before launching kernels. The extension uses the Lmod's Python interface to accomplish module-related tasks like loading, unloading, saving a collection, etc.


Run the following commands to install and enable the Jupyter Lmod extension:
<!--T:17-->
The following commands will install and enable the Jupyter Lmod extension in your environment (note: the third command takes a few minutes to complete):


{{Commands2
</translate>{{Commands2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|module load nodejs
|module load nodejs
|pip install --no-index jupyterlmod
|pip install jupyterlmod
|jupyter labextension install jupyterlab-lmod
|jupyter labextension install jupyterlab-lmod
|jupyter labextension enable jupyterlab-lmod
}}<translate>
 
<!--T:18-->
Instructions on how to manage loaded <i>software</i> modules in the JupyterLab interface are provided in the [[JupyterHub#JupyterLab|JupyterHub page]].
 
=== RStudio Server === <!--T:85-->
 
<!--T:86-->
The RStudio Server allows you to develop R codes in an RStudio environment that appears in your web browser in a separate tab. Based on the above [[#Installing_JupyterLab|Installing JupyterLab]] procedure, there are a few differences:
 
<!--T:87-->
<ol>
<li>Load the <code>rstudio-server</code> module <b>before</b> the <code>python</code> module <b>and before creating a new virtual environment</b>:
</translate>{{Command2
|module load rstudio-server python
}}
</li><translate>
<!--T:88-->
<li>Once [[#Installing_JupyterLab|Jupyter Lab is installed in the new virtual environment]], install the Jupyter RSession proxy:
</translate>{{Commands2
|prompt=(jupyter_py3) [name@server ~]$
|pip install --no-index jupyter-rsession-proxy
}}
}}
</li>
</ol><translate>


= Using your installation =
<!--T:89-->
All other configuration and usage steps are the same. In JupyterLab, you should see an RStudio application in the <i>Launcher</i> tab.


== Activating the environment ==
= Using your installation = <!--T:19-->


== Activating the environment == <!--T:20-->
<!--T:21-->
Make sure the Python virtual environment in which you have installed JupyterLab is activated.
Make sure the Python virtual environment in which you have installed JupyterLab is activated.
For example, when you log into the cluster, you have to activate it again with:
For example, when you log onto the cluster, you have to activate it again with:
{{Command2
</translate>{{Command2
|source $HOME/jupyter_py3/bin/activate
|source $HOME/jupyter_py3/bin/activate
}}
}}<translate>


<!--T:22-->
To verify that your environment is ready, you can get a list of installed <code>jupyter*</code> packages with the following command:
To verify that your environment is ready, you can get a list of installed <code>jupyter*</code> packages with the following command:
{{Command2
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|pip freeze {{!}} grep jupyter
|pip freeze {{!}} grep jupyter
Line 93: Line 151:
jupyterlab-pygments==0.1.2+computecanada
jupyterlab-pygments==0.1.2+computecanada
jupyterlab-server==2.3.0+computecanada
jupyterlab-server==2.3.0+computecanada
}}
}}<translate>


== Starting JupyterLab ==
== Starting JupyterLab == <!--T:23-->


<!--T:24-->
To start a JupyterLab server, submit an interactive job with <code>salloc</code>. Adjust the parameters based on your needs. See [[Running jobs]] for more information.
To start a JupyterLab server, submit an interactive job with <code>salloc</code>. Adjust the parameters based on your needs. See [[Running jobs]] for more information.


{{Command2
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|prompt=(jupyter_py3) [name@server ~]$
|salloc --time{{=}}1:0:0 --ntasks{{=}}1 --cpus-per-task{{=}}2 --mem-per-cpu{{=}}1024M --account{{=}}def-yourpi srun $VIRTUAL_ENV/bin/jupyterlab.sh
|salloc --time{{=}}1:0:0 --ntasks{{=}}1 --cpus-per-task{{=}}2 --mem-per-cpu{{=}}1024M --account{{=}}def-yourpi srun $VIRTUAL_ENV/bin/jupyterlab.sh
Line 114: Line 173:
         http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
         http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
     or http://127.0.0.1:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
     or http://127.0.0.1:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
}}
}}<translate>


== Connecting to JupyterLab ==
== Connecting to JupyterLab == <!--T:25-->


To access JupyterLab running on a compute node from your web browser, you will need to create an [[SSH tunnelling|SSH tunnel]] from your computer through the cluster since the compute nodes are not directly accessible from the Internet.
<!--T:26-->
To access JupyterLab running on a compute node from your web browser, you will need to create an [[SSH tunnelling|SSH tunnel]] from your computer through the cluster since the compute nodes are not directly accessible from the internet.


=== From Linux or macOS ===
=== From Linux or macOS === <!--T:27-->


<!--T:28-->
On a Linux or macOS system, we recommend using the [https://sshuttle.readthedocs.io sshuttle] Python package.
On a Linux or macOS system, we recommend using the [https://sshuttle.readthedocs.io sshuttle] Python package.


On your computer, open a new terminal window and create the SSH tunnel with the following <code>sshuttle</code> command where <code><username></code> must be substituted by your Compute Canada username, and <code><cluster></code> by the cluster on which you have launched JupyterLab:
<!--T:29-->
On your computer, open a new terminal window and create the SSH tunnel with the following <code>sshuttle</code> command where <code><username></code> must be replaced with your Alliance account username, and <code><cluster></code> by the cluster on which you have launched JupyterLab:


{{Command2
</translate>{{Command2
|prompt=[name@local ~]$
|prompt=[name@local ~]$
|sshuttle --dns -Nr <username>@<cluster>.computecanada.ca
|sshuttle --dns -Nr <username>@<cluster>.computecanada.ca
}}
}}<translate>


Then, copy and paste the first provided HTTP address into your Web browser. In the above <code>salloc</code> example, this would be:
<!--T:30-->
Then, copy and paste the first provided HTTP address into your web browser. In the above <code>salloc</code> example, this would be:


<pre>
</translate><pre>
http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
</pre>
</pre><translate>


=== From Windows ===
=== From Windows === <!--T:31-->


An [[SSH tunnelling|SSH tunnel]] can be created from Windows using [[Connecting_with_MobaXTerm | MobaXTerm]] as follows. Note: this procedure also works from a terminal in any Unix system (like Linux, macOS, etc).
<!--T:32-->
An [[SSH tunnelling|SSH tunnel]] can be created from Windows using [[Connecting_with_MobaXTerm | MobaXTerm]] as follows. Note: this procedure also works from any terminal that supports the <code>ssh</code> command.


<!--T:33-->
<ol>
<ol>
<li>Once JupyterLab is launched on a compute node (see [[#Starting_JupyterLab|Starting JupyterLab]]), you can extract the <code>hostname:port</code> and the <code>token</code> from the first provided HTTP address. For example:
<li>Once JupyterLab is launched on a compute node (see [[#Starting_JupyterLab|Starting JupyterLab]]), you can extract the <code>hostname:port</code> and the <code>token</code> from the first provided HTTP address. For example:
<pre>
</translate><pre>
http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c368829...2728fad4eb
http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c368829...2728fad4eb
       └────────────────────┬────────────────────┘          └──────────┬──────────┘
       └────────────────────┬────────────────────┘          └──────────┬──────────┘
                       hostname:port                                  token
                       hostname:port                                  token
</pre>
</pre>
</li>
</li><translate>
<li>Open a new Terminal tab in MobaXTerm. In the following command, substitute <code><hostname:port></code> by its corresponding value (refer to the above figure), substitute <code><username></code> by your Compute Canada username, and substitute <code><cluster></code> by the cluster on which you have launched JupyterLab:
<!--T:34-->
{{Command2
<li>Open a new Terminal tab in MobaXTerm. In the following command, replace <code><hostname:port></code> with its corresponding value (refer to the above figure), replace <code><username></code> with your Alliance account username, and replace <code><cluster></code> with the cluster on which you have launched JupyterLab:
</translate>{{Command2
|prompt=[name@local ~]$
|prompt=[name@local ~]$
|ssh -L 8888:<hostname:port> <username>@<cluster>.computecanada.ca
|ssh -L 8888:<hostname:port> <username>@<cluster>.computecanada.ca
}}
}}
</li>
</li><translate>
<li> Open your Web browser and go to the following address where <code><token></code> must be substituted by the alphanumerical value extracted from the above figure:
<!--T:35-->
<pre>
<li>Open your web browser and go to the following address where <code><token></code> must be replaced with the alphanumerical value extracted from the above figure:
</translate><pre>
http://localhost:8888/?token=<token>
http://localhost:8888/?token=<token>
</pre>
</pre>
</li>
</li>
</ol><translate>
== Shutting down JupyterLab == <!--T:36-->
<!--T:37-->
You can shut down the JupyterLab server before the walltime limit by pressing <b>Ctrl-C twice</b> in the terminal that launched the interactive job.
<!--T:38-->
If you have used MobaXterm to create an SSH tunnel, press <b>Ctrl-D</b> to shut down the tunnel.
= Adding kernels = <!--T:39-->
<!--T:40-->
It is possible to add kernels for other programming languages, for a different Python version or for a persistent virtual environment that has all required packages and libraries for your project. Refer to [http://jupyter-client.readthedocs.io/en/latest/kernels.html Making kernels for Jupyter] to learn more.
<!--T:41-->
The installation of a new kernel is done in two steps:
# Installation of the packages that will allow the language interpreter to communicate with the Jupyter interface.
# Creation of a file that will indicate to JupyterLab how to initiate a communication channel with the language interpreter. This file is called a <i>kernel spec file</i>, and it will be saved in a subfolder of <code>~/.local/share/jupyter/kernels</code>.
<!--T:42-->
In the following sections, we provide a few examples of the kernel installation procedure.
== Julia Kernel == <!--T:43-->
<!--T:44-->
Prerequisites:
# The configuration of a Julia kernel depends on a Python virtual environment and a <code>kernels</code> folder. If you do not have these dependencies, make sure to follow the first few instructions listed in <b>[[#Python_kernel|Python kernel]]</b> (note: no Python kernel required).
# Since the installation of Julia packages requires an access to the internet, the configuration of a Julia kernel must be done in a <b>[[SSH|remote shell session on a login node]]</b>.
<!--T:45-->
Once you have a Python virtual environment available and activated, you may configure the Julia kernel:
<!--T:46-->
<ol>
<li>Load the <b>[[Julia]]</b> module:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|module load julia
}}
</li><translate>
<!--T:47-->
<li>Install IJulia:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|echo -e 'using Pkg\nPkg.add("IJulia")' {{!}} julia
}}
</li><translate>
<!--T:48-->
<li><b>Important</b>: start or restart a new JupyterLab session before using the Julia kernel.</li>
</ol>
</ol>


== Shutting down JupyterLab ==
<!--T:49-->
For more information, see the [https://github.com/JuliaLang/IJulia.jl IJulia documentation].


You can shut down the JupyterLab server before the walltime limit by pressing '''Ctrl-C twice''' in the terminal that launched the interactive job.
=== Installing more Julia packages === <!--T:50-->


If you have used MobaXterm to create an SSH tunnel, press '''Ctrl-D''' to shut down the tunnel.
<!--T:51-->
As in the above installation procedure, it is required to install Julia packages from a login node, but the Python virtual environment could be deactivated:


= Adding kernels =
<!--T:52-->
<ol>
<li>Make sure the same Julia module is loaded:
</translate>{{Command2
|module load julia
}}
</li><translate>
<!--T:53-->
<li>Install any required package. For example with <code>Glob</code>:
</translate>{{Command2
|echo -e 'using Pkg\nPkg.add("Glob")' {{!}} julia
}}
</li><translate>
<!--T:54-->
<li>The newly installed Julia packages should already be usable in a notebook executed by the Julia kernel.</li>
</ol>


= References =
== Python kernel == <!--T:55-->
 
<!--T:56-->
In a terminal with an active session on the remote server,
you may configure a [[Python#Creating_and_using_a_virtual_environment|Python virtual environment]] with all the required [[Available_Python_wheels|Python modules]]
and a custom Python kernel for JupyterLab.
Here are the initial steps for the simplest Jupyter configuration
in a new Python virtual environment:
 
<!--T:57-->
<ol>
<li>If you do not have a Python virtual environment, create one. Then, activate it:</li>
<ol>
<li>Start from a clean Bash environment (this is only required if you are using the Jupyter <i>Terminal</i> via [[JupyterHub]] for the creation and configuration of the Python kernel):
</translate>{{Command2
|env -i HOME{{=}}$HOME bash -l
}}
</li><translate>
<!--T:58-->
<li>Load a Python module:
</translate>{{Command2
|module load python
}}
</li><translate>
<!--T:59-->
<li>Create a new Python virtual environment:
</translate>{{Command2
|virtualenv --no-download $HOME/jupyter_py3
}}
</li><translate>
<!--T:60-->
<li>Activate your newly created Python virtual environment:
</translate>{{Command2
|source $HOME/jupyter_py3/bin/activate
}}
</ol>
</li><translate>
<!--T:62-->
<li>Create the common <code>kernels</code> folder, which is used by all kernels you want to install:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|mkdir -p ~/.local/share/jupyter/kernels
}}
</li><translate>
<!--T:61-->
<li>Finally, install the Python kernel:
<ol>
<li>Install the <code>ipykernel</code> library:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|pip install --no-index ipykernel
}}
</li><translate>
<!--T:63-->
<li>Generate the kernel spec file. Replace <code><unique_name></code> with a name that will uniquely identify your kernel:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|python -m ipykernel install --user --name <unique_name> --display-name "Python 3.x Kernel"
}}
</li>
</ol>
</li><translate>
<!--T:64-->
<li><b>Important</b>: start or restart a new JupyterLab session before using the Python kernel.</li>
</ol>
 
<!--T:65-->
For more information, see the [http://ipython.readthedocs.io/en/stable/install/kernel_install.html ipykernel documentation].
 
=== Installing more Python libraries === <!--T:66-->
 
<!--T:67-->
Based on the Python virtual environment configured in the previous section:
 
<!--T:68-->
<ol>
<li>If you are using the Jupyter <i>Terminal</i> via [[JupyterHub]], make sure the activated Python virtual environment is running in a clean Bash environment. See the above section for details.</li>
<li>Install any required library. For example, <code>numpy</code>:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|pip install --no-index numpy
}}
</li><translate>
<!--T:69-->
<li>The newly installed Python libraries can now be imported in any notebook using the <code>Python 3.x Kernel</code>.</li>
</ol>
 
== R Kernel == <!--T:70-->
 
<!--T:71-->
Prerequisites:
# The configuration of an R kernel depends on a Python virtual environment and a <code>kernels</code> folder. If you do not have these dependencies, make sure to follow the first few instructions listed in <b>[[#Python_kernel|Python kernel]]</b> (note: no Python kernel required).
# Since the installation of R packages requires an access to <b>[https://cran.r-project.org/ CRAN]</b>, the configuration of an R kernel must be done in a <b>[[SSH|remote shell session on a login node]]</b>.
 
<!--T:72-->
Once you have a Python virtual environment available and activated, you may configure the R kernel:
 
<!--T:73-->
<ol>
<li>Load an R module:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|module load r/4.1
}}
</li><translate>
<!--T:74-->
<li>Install the R kernel dependencies (<code>crayon</code>, <code>pbdZMQ</code>, <code>devtools</code>) - this will take up to 10 minutes, and packages should be installed in a local directory like <code>~/R/x86_64-pc-linux-gnu-library/4.1</code>:
</translate>{{Command2
|prompt=(jupyter_py3) [name@server ~]$
|R --no-save
|result=> install.packages(c('crayon', 'pbdZMQ', 'devtools'), repos{{=}}'http://cran.us.r-project.org')
}}
</li><translate>
<!--T:75-->
<li>Install the R kernel.
</translate>{{Command2
|prompt=>
|devtools::install_github(paste0('IRkernel/', c('repr', 'IRdisplay', 'IRkernel')))
}}
</li><translate>
<!--T:76-->
<li>Install the R kernel spec file.
</translate>{{Command2
|prompt=>
|IRkernel::installspec()
}}
</li><translate>
<!--T:77-->
<li><b>Important</b>: Start or restart a new JupyterLab session before using the R kernel.</li>
</ol>
 
<!--T:78-->
For more information, see the [https://irkernel.github.io/docs/ IRkernel documentation].
 
=== Installing more R packages === <!--T:79-->
 
<!--T:80-->
The installation of R packages cannot be done from notebooks because there is no access to CRAN.
As in the above installation procedure, it is required to install R packages from a login node, but the Python virtual environment could be deactivated:
 
<!--T:81-->
<ol>
<li>Make sure the same R module is loaded:
</translate>{{Command2
|module load r/4.1
}}
</li><translate>
<!--T:82-->
<li>Start the R shell and install any required package. For example with <code>doParallel</code>:
</translate>{{Command2
|R --no-save
|result=> install.packages('doParallel', repos{{=}}'http://cran.us.r-project.org')
}}
</li><translate>
<!--T:83-->
<li>The newly installed R packages should already be usable in a notebook executed by the R kernel.</li>
</ol>
 
= Running notebooks as Python scripts = <!--T:91-->
For longer run or analysis, we need to submit a [[Running_jobs#Use_sbatch_to_submit_jobs|non-interactive job]]. We then need to convert our notebook to a Python script, create a submission script and submit it.
 
<!--T:92-->
1. From the login node, create and activate a [[Python#Creating_and_using_a_virtual_environment|virtual environment]], then install <tt>nbconvert</tt> if not already available.
{{Command
|prompt=(venv) [name@server ~]$
|pip install --no-index nbconvert
}}
 
<!--T:93-->
2. Convert the notebook (or all notebooks) to Python scripts.
{{Command
|prompt=(venv) [name@server ~]$
|jupyter nbconvert --to python mynotebook.ipynb
}}
 
<!--T:94-->
3. Create your submission script, and submit your job.
 
<!--T:95-->
In your submission script, run your converted notebook with:
<syntaxhighlight lang="bash">python mynotebook.py</syntaxhighlight>
 
and submit your non-interactive job:
{{Command
|sbatch my-submit.sh
}}
= References = <!--T:84-->
</translate>

Latest revision as of 19:01, 4 July 2024

Other languages:

Introduction[edit]

Running notebooks

Jupyter Lab and notebooks are meant for short interactive tasks such as testing, debugging or quickly visualize data (few minutes). Running longer analysis must be done in an non-interactive job (sbatch). See also how to run notebooks as python scripts below.



  • Project Jupyter: "a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages."[1]
  • JupyterLab: "a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design allows for extensions that expand and enrich functionality."[2]

A JupyterLab server should only run on a compute node or on a cloud instance; cluster login nodes are not a good choice because they impose various limits which can stop applications if they consume too much CPU time or memory. In the case of using a compute node, users can reserve compute resources by submitting a job that requests a specific number of CPUs (and optionally GPUs), an amount of memory and the run time. In this page, we give detailed instructions on how to configure and submit a JupyterLab job on any national cluster.

If you are instead looking for a preconfigured Jupyter environment, please check the Jupyter page.

Installing JupyterLab[edit]

These instructions install JupyterLab with the pip command in a Python virtual environment:

  1. If you do not have an existing Python virtual environment, create one. Then, activate it:
    1. Load a Python module, either the default one (as shown below) or a specific version (see available versions with module avail python):
      [name@server ~]$ module load python
      
      If you intend to use RStudio Server, make sure to load rstudio-server first:
      [name@server ~]$ module load rstudio-server python
      
    2. Create a new Python virtual environment:
      [name@server ~]$ virtualenv --no-download $HOME/jupyter_py3
      
    3. Activate your newly created Python virtual environment:
      [name@server ~]$ source $HOME/jupyter_py3/bin/activate
      
  2. Install JupyterLab in your new virtual environment (note: it takes a few minutes):
    (jupyter_py3) [name@server ~]$ pip install --no-index --upgrade pip
    (jupyter_py3) [name@server ~]$ pip install --no-index jupyterlab
    
  3. In the virtual environment, create a wrapper script that launches JupyterLab:
    (jupyter_py3) [name@server ~]$ echo -e '#!/bin/bash\nunset XDG_RUNTIME_DIR\njupyter lab --ip $(hostname -f) --no-browser' > $VIRTUAL_ENV/bin/jupyterlab.sh
    
  4. Finally, make the script executable:
    (jupyter_py3) [name@server ~]$ chmod u+x $VIRTUAL_ENV/bin/jupyterlab.sh
    

Installing extensions[edit]

Extensions allow you to add functionalities and modify the JupyterLab’s user interface.

Jupyter Lmod[edit]

Jupyter Lmod is an extension that allows you to interact with environment modules before launching kernels. The extension uses the Lmod's Python interface to accomplish module-related tasks like loading, unloading, saving a collection, etc.

The following commands will install and enable the Jupyter Lmod extension in your environment (note: the third command takes a few minutes to complete):

(jupyter_py3) [name@server ~]$ module load nodejs
(jupyter_py3) [name@server ~]$ pip install jupyterlmod
(jupyter_py3) [name@server ~]$ jupyter labextension install jupyterlab-lmod

Instructions on how to manage loaded software modules in the JupyterLab interface are provided in the JupyterHub page.

RStudio Server[edit]

The RStudio Server allows you to develop R codes in an RStudio environment that appears in your web browser in a separate tab. Based on the above Installing JupyterLab procedure, there are a few differences:

  1. Load the rstudio-server module before the python module and before creating a new virtual environment:
    [name@server ~]$ module load rstudio-server python
    
  2. Once Jupyter Lab is installed in the new virtual environment, install the Jupyter RSession proxy:
    (jupyter_py3) [name@server ~]$ pip install --no-index jupyter-rsession-proxy
    

All other configuration and usage steps are the same. In JupyterLab, you should see an RStudio application in the Launcher tab.

Using your installation[edit]

Activating the environment[edit]

Make sure the Python virtual environment in which you have installed JupyterLab is activated.

For example, when you log onto the cluster, you have to activate it again with:

[name@server ~]$ source $HOME/jupyter_py3/bin/activate

To verify that your environment is ready, you can get a list of installed jupyter* packages with the following command:

(jupyter_py3) [name@server ~]$ pip freeze | grep jupyter
jupyter-client==7.1.0+computecanada
jupyter-core==4.9.1+computecanada
jupyter-server==1.9.0+computecanada
jupyterlab==3.1.7+computecanada
jupyterlab-pygments==0.1.2+computecanada
jupyterlab-server==2.3.0+computecanada

Starting JupyterLab[edit]

To start a JupyterLab server, submit an interactive job with salloc. Adjust the parameters based on your needs. See Running jobs for more information.

(jupyter_py3) [name@server ~]$ salloc --time=1:0:0 --ntasks=1 --cpus-per-task=2 --mem-per-cpu=1024M --account=def-yourpi srun $VIRTUAL_ENV/bin/jupyterlab.sh
...
[I 2021-12-06 10:37:14.262 ServerApp] jupyterlab | extension was successfully linked.
...
[I 2021-12-06 10:37:39.259 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-12-06 10:37:39.356 ServerApp]

    To access the server, open this file in a browser:
        file:///home/name/.local/share/jupyter/runtime/jpserver-198146-open.html
    Or copy and paste one of these URLs:
        http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
     or http://127.0.0.1:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb

Connecting to JupyterLab[edit]

To access JupyterLab running on a compute node from your web browser, you will need to create an SSH tunnel from your computer through the cluster since the compute nodes are not directly accessible from the internet.

From Linux or macOS[edit]

On a Linux or macOS system, we recommend using the sshuttle Python package.

On your computer, open a new terminal window and create the SSH tunnel with the following sshuttle command where <username> must be replaced with your Alliance account username, and <cluster> by the cluster on which you have launched JupyterLab:

[name@local ~]$ sshuttle --dns -Nr <username>@<cluster>.computecanada.ca

Then, copy and paste the first provided HTTP address into your web browser. In the above salloc example, this would be:

http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb

From Windows[edit]

An SSH tunnel can be created from Windows using MobaXTerm as follows. Note: this procedure also works from any terminal that supports the ssh command.

  1. Once JupyterLab is launched on a compute node (see Starting JupyterLab), you can extract the hostname:port and the token from the first provided HTTP address. For example:
    http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c368829...2728fad4eb
           └────────────────────┬────────────────────┘           └──────────┬──────────┘
                          hostname:port                                   token
    
  2. Open a new Terminal tab in MobaXTerm. In the following command, replace <hostname:port> with its corresponding value (refer to the above figure), replace <username> with your Alliance account username, and replace <cluster> with the cluster on which you have launched JupyterLab:
    [name@local ~]$ ssh -L 8888:<hostname:port> <username>@<cluster>.computecanada.ca
    
  3. Open your web browser and go to the following address where <token> must be replaced with the alphanumerical value extracted from the above figure:
    http://localhost:8888/?token=<token>
    

Shutting down JupyterLab[edit]

You can shut down the JupyterLab server before the walltime limit by pressing Ctrl-C twice in the terminal that launched the interactive job.

If you have used MobaXterm to create an SSH tunnel, press Ctrl-D to shut down the tunnel.

Adding kernels[edit]

It is possible to add kernels for other programming languages, for a different Python version or for a persistent virtual environment that has all required packages and libraries for your project. Refer to Making kernels for Jupyter to learn more.

The installation of a new kernel is done in two steps:

  1. Installation of the packages that will allow the language interpreter to communicate with the Jupyter interface.
  2. Creation of a file that will indicate to JupyterLab how to initiate a communication channel with the language interpreter. This file is called a kernel spec file, and it will be saved in a subfolder of ~/.local/share/jupyter/kernels.

In the following sections, we provide a few examples of the kernel installation procedure.

Julia Kernel[edit]

Prerequisites:

  1. The configuration of a Julia kernel depends on a Python virtual environment and a kernels folder. If you do not have these dependencies, make sure to follow the first few instructions listed in Python kernel (note: no Python kernel required).
  2. Since the installation of Julia packages requires an access to the internet, the configuration of a Julia kernel must be done in a remote shell session on a login node.

Once you have a Python virtual environment available and activated, you may configure the Julia kernel:

  1. Load the Julia module:
    (jupyter_py3) [name@server ~]$ module load julia
    
  2. Install IJulia:
    (jupyter_py3) [name@server ~]$ echo -e 'using Pkg\nPkg.add("IJulia")' | julia
    
  3. Important: start or restart a new JupyterLab session before using the Julia kernel.

For more information, see the IJulia documentation.

Installing more Julia packages[edit]

As in the above installation procedure, it is required to install Julia packages from a login node, but the Python virtual environment could be deactivated:

  1. Make sure the same Julia module is loaded:
    [name@server ~]$ module load julia
    
  2. Install any required package. For example with Glob:
    [name@server ~]$ echo -e 'using Pkg\nPkg.add("Glob")' | julia
    
  3. The newly installed Julia packages should already be usable in a notebook executed by the Julia kernel.

Python kernel[edit]

In a terminal with an active session on the remote server, you may configure a Python virtual environment with all the required Python modules and a custom Python kernel for JupyterLab. Here are the initial steps for the simplest Jupyter configuration in a new Python virtual environment:

  1. If you do not have a Python virtual environment, create one. Then, activate it:
    1. Start from a clean Bash environment (this is only required if you are using the Jupyter Terminal via JupyterHub for the creation and configuration of the Python kernel):
      [name@server ~]$ env -i HOME=$HOME bash -l
      
    2. Load a Python module:
      [name@server ~]$ module load python
      
    3. Create a new Python virtual environment:
      [name@server ~]$ virtualenv --no-download $HOME/jupyter_py3
      
    4. Activate your newly created Python virtual environment:
      [name@server ~]$ source $HOME/jupyter_py3/bin/activate
      
  2. Create the common kernels folder, which is used by all kernels you want to install:
    (jupyter_py3) [name@server ~]$ mkdir -p ~/.local/share/jupyter/kernels
    
  3. Finally, install the Python kernel:
    1. Install the ipykernel library:
      (jupyter_py3) [name@server ~]$ pip install --no-index ipykernel
      
    2. Generate the kernel spec file. Replace <unique_name> with a name that will uniquely identify your kernel:
      (jupyter_py3) [name@server ~]$ python -m ipykernel install --user --name <unique_name> --display-name "Python 3.x Kernel"
      
  4. Important: start or restart a new JupyterLab session before using the Python kernel.

For more information, see the ipykernel documentation.

Installing more Python libraries[edit]

Based on the Python virtual environment configured in the previous section:

  1. If you are using the Jupyter Terminal via JupyterHub, make sure the activated Python virtual environment is running in a clean Bash environment. See the above section for details.
  2. Install any required library. For example, numpy:
    (jupyter_py3) [name@server ~]$ pip install --no-index numpy
    
  3. The newly installed Python libraries can now be imported in any notebook using the Python 3.x Kernel.

R Kernel[edit]

Prerequisites:

  1. The configuration of an R kernel depends on a Python virtual environment and a kernels folder. If you do not have these dependencies, make sure to follow the first few instructions listed in Python kernel (note: no Python kernel required).
  2. Since the installation of R packages requires an access to CRAN, the configuration of an R kernel must be done in a remote shell session on a login node.

Once you have a Python virtual environment available and activated, you may configure the R kernel:

  1. Load an R module:
    (jupyter_py3) [name@server ~]$ module load r/4.1
    
  2. Install the R kernel dependencies (crayon, pbdZMQ, devtools) - this will take up to 10 minutes, and packages should be installed in a local directory like ~/R/x86_64-pc-linux-gnu-library/4.1:
    (jupyter_py3) [name@server ~]$ R --no-save
    > install.packages(c('crayon', 'pbdZMQ', 'devtools'), repos='http://cran.us.r-project.org')
    
  3. Install the R kernel.
    > devtools::install_github(paste0('IRkernel/', c('repr', 'IRdisplay', 'IRkernel')))
    
  4. Install the R kernel spec file.
    > IRkernel::installspec()
    
  5. Important: Start or restart a new JupyterLab session before using the R kernel.

For more information, see the IRkernel documentation.

Installing more R packages[edit]

The installation of R packages cannot be done from notebooks because there is no access to CRAN. As in the above installation procedure, it is required to install R packages from a login node, but the Python virtual environment could be deactivated:

  1. Make sure the same R module is loaded:
    [name@server ~]$ module load r/4.1
    
  2. Start the R shell and install any required package. For example with doParallel:
    [name@server ~]$ R --no-save
    > install.packages('doParallel', repos='http://cran.us.r-project.org')
    
  3. The newly installed R packages should already be usable in a notebook executed by the R kernel.

Running notebooks as Python scripts[edit]

For longer run or analysis, we need to submit a non-interactive job. We then need to convert our notebook to a Python script, create a submission script and submit it.

1. From the login node, create and activate a virtual environment, then install nbconvert if not already available.

Question.png
(venv) [name@server ~]$ pip install --no-index nbconvert

2. Convert the notebook (or all notebooks) to Python scripts.

Question.png
(venv) [name@server ~]$ jupyter nbconvert --to python mynotebook.ipynb

3. Create your submission script, and submit your job.

In your submission script, run your converted notebook with:

python mynotebook.py

and submit your non-interactive job:

Question.png
[name@server ~]$ sbatch my-submit.sh

References[edit]