SSH tunnelling
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
What is SSH tunneling?
SSH tunnelling is a method to use a gateway computer to connect two computers that cannot connect directly.
In the context of Compute Canada, SSH tunneling is necessary in certain cases, because compute nodes on Niagara and Graham do not have access to the internet, nor can the compute nodes be contacted via ssh directly from outside the respective datacentres that they are located in.
SSH tunnels can be setup by users in their job scripts.
The following use cases require SSH tunnels:
- Running commercial software on a compute node that needs to contact a license server over the internet.
- Running visualization software on a compute node that needs to be contacted by a client on a user's local computer.
- Running a Jupyter notebook on a compute node that needs to be contacted by the web browser on a user's local computer.
In the first case, the license server is situated outside of the compute cluster and is rarely under a user's control, whereas in the other cases, the server is on the compute node but the challenge is to connect to it from the outside. We will therefore consider these two kind of cases separately.
Contacting a license server from a compute node using SSH tunneling
With SSH tunneling, a port on the compute node where a job is running can forward all requests to the approriate port on the license server by using a gateway server with internet access. Ports, in this context, are numbers which distinguish different kinds of communications. Because SSH tunneling involves specific ports, it is also called 'port forwarding'. In most cases, getting SSH tunneling to work in batch jobs requires just two or three extra commands in your job script.
To know how to setup up SSH tunneling, the following bits of information are required:
- The IP address, or the name, of the license server. Let's call this LICSERVER.
- The port number of the license service. Let's call this LICPORT.
The maintainers of the license server will have this information. That server should allow connections from the login nodes; for Niagara, outgoing IP addresses will range from 142.150.188.71 to 142.150.188.77.
With this information, one can now setup the SSH tunneling. For Graham, an alternative resolution is to request a firewall exception for the license server LICSERVER and its specific port LICPORT.
The gateway server on Niagara is called nia-gw. On Graham, you need to pick one of the login nodes (gra-login1, 2, ...). Let us call the gateway node GATEWAY. You also need to choose the port number on the compute node to use. Let's call the latter COMPUTEPORT.
The ssh command to issue in the job script is then:
ssh GATEWAY -L COMPUTEPORT:LICSERVER:LICPORT -n -N -f
In this command, the string following the -L parameter specifies the port forwarding information, the parameter -n prevents ssh to read input (it couldn't in a compute job anyway), the parameter -N tells ssh not to open a shell on the GATEWAY, and the parameter -f tells ssh to run in the background, allowing the job script to proceed past this ssh command.
A further command to add to the job script should tell the software that the license server is on port COMPUTEPORT on the server 'localhost'. Here, 'localhost' is not a placeholder, rather, it is the literal name to use - 'localhost' is a standard hostname pseudonym by which a computer can refer to itself. Exactly how to informl your software to use this port on 'localhost' will depend on the specific application and the type of license server, but often it is simply a matter of setting an environment variable in the job script like
export MLM_LICENSE_FILE=COMPUTEPORT@localhost
Example job script
The following job script sets up an ssh tunnel to contact a license server licenseserver.institution.ca at port 9999:
#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 40
#SBATCH --time 3:00:00
ssh nia-gw -L 9999:licenseserver.institution.ca:9999 -N -f
export MLM_LICENSE_FILE=9999@localhost
module load thesoftware/2.0
mpirun thesoftware .....
Contacting a visualization, Jupyterhub or other server running on compute node
SSH tunnelling can also be used in the context of Compute Canada to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a Jupyter notebook or visualization software to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster.
Example for a job
# License export LM_PROJECT= export CDLMD_LICENSE_FILE=1999@localhost # Start the SSH tunnel ssh -n -N -L 1999:flex.cd-adapco.com:1999 gra-login1 & SSH1=$! ssh -n -N -L 2099:flex.cd-adapco.com:2099 gra-login1 & SSH2=$! # Launch the code <whatever> # Stop the SSH tunnel kill -9 $SSH1 kill -9 $SSH2
There is NAT on both Graham and Cedar allowing users to access the Internet from the compute nodes. On Graham, however, access is blocked by default at the firewall. A user (or an analyst) would need to submit submit a request to have a specific port/IP open.
From Linux or MacOS X
On a Linux or MacOS X system, we recommend using the sshuttle Python package.
On your computer, open a new terminal window and run the following sshuttle command to create the tunnel.
[name@my_computer $] sshuttle --dns -Nr userid@machine_name
Then, copy and paste the provided URL into your browser. In the above example, this would be
http://cdr544.int.cedar.computecanada.ca:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
From Windows
An SSH tunnel can be created from Windows using MobaXTerm as follows.
Open two sessions in MobaXTerm.
- Session 1 should be a connection to a cluster. Follow the instructions in section Starting Jupyter Notebook.
- Session 2 should be a local terminal in which we will set up the SSH tunnel. Run the following command, substituting the node name from the URL you received in Session 1. Follow the instructions in section Starting Jupyter Notebook.
[name@my_computer ]$ ssh -L 8888:cdr544.int.cedar.computecanada.ca:8888 someuser@cedar.computecanada.ca
This command performs local port forwarding (-L). It forwards local port 8888 to cdr544.int.cedar.computecanada.ca:8888
, which is the host name given when Jupyter Notebook was started.
Open your browser and go to
http://localhost:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
Replace the token in this example with the one given to you in Session 1. You can also type http://localhost:8888
and there will be a prompt asking you for the token, which you can then copy and paste.