SSH tunnelling: Difference between revisions

no edit summary
No edit summary
Line 1: Line 1:
{{Draft}}
{{Draft}}
SSH tunnelling is a method which in the context of Compute Canada allows a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a [[Jupyter | Jupyter notebook]] or [[Visualization|visualization software]] to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster.  
 
=What is SSH tunneling?=
SSH tunnelling is a method to use a gateway computer to connect two
computers that cannot connect directly.
 
In the context of Compute Canada, SSH tunneling is necessary in certain cases, because
compute nodes on Niagara and Graham do not have access to the
internet, nor can the compute nodes be contacted via ssh directly from
outside the respective datacentres that they are located in.
 
SSH tunnels can be setup by users in their job scripts.
 
The following use cases require SSH tunnels:
 
# Running commercial software on a compute node that needs to contact a license server over the internet.
 
# Running [[Visualization|visualization software]] on a compute node that needs to be contacted by a client on a user's local computer.
 
# Running a [[Jupyter | Jupyter notebook]] on a compute node that needs to be contacted by the web browser on a user's local computer.
 
In the first case, the license server is situated outside of
the compute cluster and is rarely under a user's control, whereas
in the last other cases, the server is on the compute node but the
challenge is to connect to it from the outside. We will therefore
consider these two kind of cases separately.
 
= Contacting a license server from a compute node using SSH tunneling =
 
With SSH tunneling, a port on the compute node where a job is running
can forward all requests to the approriate port on the license server
by using a gateway server with internet access.  Ports, in this
context, are numbers which distinguish different kinds of
communications.  Because SSH tunneling involves specific ports, it is
also called 'port forwarding'.  In most cases, getting SSH tunneling
to work in batch jobs requires just two or three extra commands in
your job script.
 
To know how to setup up SSH tunneling, the following bits of
information are required:
 
# The IP address, or the name, of the license server. Let's call this LICSERVER.
 
# The port number of the license service. Let's call this LICPORT.
 
The maintainers of the license server will have this information.
That server should allow connections from the login nodes; for
Niagara, outgoing IP addresses will range from 142.150.188.71 to
142.150.188.77.
 
With this information, one can now setup the SSH tunneling.  For
Graham, an alternative resolution is to request a firewall exception
for the license server LICSERVER and its specific port LICPORT.
 
The gateway server on Niagara is called nia-gw.  On Graham, you need
to pick one of the login nodes (gra-login1, 2, ...). Let us call the
gateway node GATEWAY. You also need to choose the port number on the
compute node to use. Let's call the latter COMPUTEPORT.
 
The ssh command to issue in the job script is then:
 
<source lang="bash">
ssh GATEWAY -L COMPUTEPORT:LICSERVER:LICPORT -n -N -f
</source>
 
In this command, the parameter -n prevents ssh to read from stdin, the
parameter -N tells ssh not to open a shell on the GATEWAY, and the
parameter -f tells ssh to run in the background, allowing the job
script to proceed past this ssh command.
 
A further command to add to the job script should tell the software
that the license server is on port COMPUTEPORT on the server
'localhost' ('localhost' is not a placeholder, it is the literal name
to use, In fact, it is a standard hostname pseudonym by which a
computer can refer to itself). How to tell your software this can
depends on the specific application and the type of license server,
but often it is simply a matter of setting an environment variable in
the job script like
 
<source lang="bash">
export MLM_LICENSE_FILE=COMPUTEPORT@localhost
</source>
 
== Example job script==
 
The following job script sets up an ssh tunnel to contact a
license server licenseserver.institution.ca at port 9999:
 
<source lang="bash">
#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 40
#SBATCH --time 3:00:00
 
ssh -nia-gw -p 2222 -L 9999:licenseserver.institution.ca:9999 -N -f
export MLM_LICENSE_FILE=9999@localhost
 
module load thesoftware/2.0
mpirun thesoftware .....
</source>
 
 
= Contacting a visualization, Jupyterhub or other server running on compute node using SSH tunneling from the outside=
 
SSH tunnelling can also be used in the context of Compute Canada to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a [[Jupyter | Jupyter notebook]] or [[Visualization|visualization software]] to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster.  


== Example for a job ==
== Example for a job ==
cc_staff
150

edits