Advanced Jupyter configuration

From Alliance Doc
Revision as of 22:11, 6 December 2021 by Plstonge (talk | contribs) (How to use the Lmod extension in JupyterLab)
Jump to navigation Jump to search


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




Introduction[edit]

  • Project Jupyter: "a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages."[1]
  • JupyterLab: "a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design allows for extensions that expand and enrich functionality."[2]

A JupyterLab server should only run on a compute node or on a cloud instance; cluster login nodes are not a good choice because they impose various limits which can stop applications if they consume too much CPU time or memory. In the case of using a compute node, users can reserve compute resources by submitting a job that requests a specific number of CPUs (and optionally GPUs), an amount of memory and the run time. In this page, we give detailed instructions on how to configure and submit a JupyterLab job on any national cluster.

But, what about ...

  • Jupyter Notebook? JupyterLab is a more modern and flexible interface than the classic Jupyter Notebook
  • A preconfigured JupyterLab service? Some regional partners provide a web portal named JupyterHub. For instance, many of these portals offer a preconfigured JupyterLab service so that users do not have to create their own setup. To learn more, visit the JupyterHub wiki page

Installing JupyterLab[edit]

These instructions install JupyterLab with the pip command in a Python virtual environment:

  1. Load a Python module, either the default one (as shown below) or a specific version (see available versions with module avail python):
    [name@server ~]$ module load python
    
  2. Create a new Python virtual environment:
    [name@server ~]$ virtualenv --no-download $HOME/jupyter_py3
    
  3. Activate your newly created Python virtual environment:
    [name@server ~]$ source $HOME/jupyter_py3/bin/activate
    
  4. Install JupyterLab in your new virtual environment (note: it takes a few minutes):
    (jupyter_py3) [name@server ~]$ pip install --no-index jupyterlab
    
  5. In the virtual environment, create a wrapper script that launches JupyterLab:
    (jupyter_py3) [name@server ~]$ echo -e '#!/bin/bash\nunset XDG_RUNTIME_DIR\njupyter-lab --ip $(hostname -f) --no-browser' > $VIRTUAL_ENV/bin/jupyterlab.sh
    
  6. Finally, make the script executable:
    (jupyter_py3) [name@server ~]$ chmod u+x $VIRTUAL_ENV/bin/jupyterlab.sh
    

Installing extensions[edit]

Extensions allow you to add functionalities and modify the JupyterLab’s user interface.

Jupyter Lmod[edit]

Jupyter Lmod is an extension that allows you to interact with environment modules before launching kernels. The extension uses the Lmod's Python interface to accomplish module-related tasks like loading, unloading, saving a collection, etc.

The following commands will install and enable the Jupyter Lmod extension in your environment (note: the third command takes a few minutes to complete):

(jupyter_py3) [name@server ~]$ module load nodejs
(jupyter_py3) [name@server ~]$ pip install jupyterlmod
(jupyter_py3) [name@server ~]$ jupyter labextension install jupyterlab-lmod


Instructions on how to configure software modules in the JupyterLab interface are provided in the JupyterHub page.

Using your installation[edit]

Activating the environment[edit]

Make sure the Python virtual environment in which you have installed JupyterLab is activated. For example, when you log into the cluster, you have to activate it again with:

[name@server ~]$ source $HOME/jupyter_py3/bin/activate


To verify that your environment is ready, you can get a list of installed jupyter* packages with the following command:

(jupyter_py3) [name@server ~]$ pip freeze | grep jupyter
jupyter-client==7.1.0+computecanada
jupyter-core==4.9.1+computecanada
jupyter-server==1.9.0+computecanada
jupyterlab==3.1.7+computecanada
jupyterlab-pygments==0.1.2+computecanada
jupyterlab-server==2.3.0+computecanada


Starting JupyterLab[edit]

To start a JupyterLab server, submit an interactive job with salloc. Adjust the parameters based on your needs. See Running jobs for more information.

(jupyter_py3) [name@server ~]$ salloc --time=1:0:0 --ntasks=1 --cpus-per-task=2 --mem-per-cpu=1024M --account=def-yourpi srun $VIRTUAL_ENV/bin/jupyterlab.sh
...
[I 2021-12-06 10:37:14.262 ServerApp] jupyterlab | extension was successfully linked.
...
[I 2021-12-06 10:37:39.259 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-12-06 10:37:39.356 ServerApp]

    To access the server, open this file in a browser:
        file:///home/name/.local/share/jupyter/runtime/jpserver-198146-open.html
    Or copy and paste one of these URLs:
        http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb
     or http://127.0.0.1:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb


Connecting to JupyterLab[edit]

To access JupyterLab running on a compute node from your web browser, you will need to create an SSH tunnel from your computer through the cluster since the compute nodes are not directly accessible from the Internet.

From Linux or macOS[edit]

On a Linux or macOS system, we recommend using the sshuttle Python package.

On your computer, open a new terminal window and create the SSH tunnel with the following sshuttle command where <username> must be substituted by your Compute Canada username, and <cluster> by the cluster on which you have launched JupyterLab:

[name@local ~]$ sshuttle --dns -Nr <username>@<cluster>.computecanada.ca


Then, copy and paste the first provided HTTP address into your Web browser. In the above salloc example, this would be:

http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c3688298e78ab554ef86d93a196deaf5bcd2728fad4eb

From Windows[edit]

An SSH tunnel can be created from Windows using MobaXTerm as follows. Note: this procedure also works from a terminal in any Unix system (like Linux, macOS, etc).

  1. Once JupyterLab is launched on a compute node (see Starting JupyterLab), you can extract the hostname:port and the token from the first provided HTTP address. For example:
    http://node_name.int.cluster.computecanada.ca:8888/lab?token=101c368829...2728fad4eb
           └────────────────────┬────────────────────┘           └──────────┬──────────┘
                          hostname:port                                   token
    
  2. Open a new Terminal tab in MobaXTerm. In the following command, substitute <hostname:port> by its corresponding value (refer to the above figure), substitute <username> by your Compute Canada username, and substitute <cluster> by the cluster on which you have launched JupyterLab:
    [name@local ~]$ ssh -L 8888:<hostname:port> <username>@<cluster>.computecanada.ca
    
  3. Open your Web browser and go to the following address where <token> must be substituted by the alphanumerical value extracted from the above figure:
    http://localhost:8888/?token=<token>
    

Shutting down JupyterLab[edit]

You can shut down the JupyterLab server before the walltime limit by pressing Ctrl-C twice in the terminal that launched the interactive job.

If you have used MobaXterm to create an SSH tunnel, press Ctrl-D to shut down the tunnel.

Adding kernels[edit]

References[edit]