SSH tunnelling: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
mNo edit summary
No edit summary
 
(32 intermediate revisions by 9 users not shown)
Line 2: Line 2:
<translate>
<translate>


=What is SSH tunneling?= <!--T:1-->
<!--T:53-->
''Parent page: [[SSH]]''
 
=What is SSH tunnelling?= <!--T:1-->


<!--T:2-->
<!--T:2-->
Line 9: Line 12:


<!--T:3-->
<!--T:3-->
In the context of Compute Canada, SSH tunneling is necessary in certain cases,
In the context of the Alliance, SSH tunnelling is necessary in certain cases,
because compute nodes on [[Niagara]] and [[Graham]] do not have direct access to
because compute nodes on [[Niagara]], [[Béluga]] and [[Graham]] do not have direct access to
the internet, nor can the compute nodes be contacted directly from the internet.
the Internet, nor can the compute nodes be contacted directly from the Internet.


<!--T:4-->
<!--T:4-->
Line 17: Line 20:


<!--T:5-->
<!--T:5-->
# Running commercial software on a compute node that needs to contact a license server over the internet.
* Running commercial software on a compute node that needs to contact a license server over the Internet;
# Running [[Visualization|visualization software]] on a compute node that needs to be contacted by client software on a user's local computer.
* Running [[Visualization|visualization software]] on a compute node that needs to be contacted by client software on a user's local computer;
# Running a [[Jupyter | Jupyter notebook]] on a compute node that needs to be contacted by the web browser on a user's local computer.
* Running a [[Jupyter | Jupyter Notebook]] on a compute node that needs to be contacted by the Web browser on a user's local computer;
# Connecting to cedar database server from somewhere other than cedar head node, e.g., your desktop
* Connecting to the Cedar database server from somewhere other than the Cedar head node, e.g., your desktop.


<!--T:6-->
<!--T:6-->
In the first case, the license server is situated outside of
In the first case, the license server is outside of
the compute cluster and is rarely under a user's control, whereas
the compute cluster and is rarely under a user's control, whereas
in the other cases, the server is on the compute node but the
in the other cases, the server is on the compute node but the
challenge is to connect to it from the outside. We will therefore
challenge is to connect to it from the outside. We will therefore
consider these two kind of cases separately.
consider these two situations below.
 
<!--T:54-->
While not strictly required to use SSH tunnelling, you may wish to be familiar with [[SSH Keys|SSH key pairs]].


= Contacting a license server from a compute node = <!--T:7-->
= Contacting a license server from a compute node = <!--T:7-->
Line 36: Line 42:
|panelstyle=SideCallout
|panelstyle=SideCallout
|content=
|content=
A port is a number used to distinguish different streams of communication  
A port is a number used to distinguish streams of communication  
from one another. You can think of it as loosely analogous to a radio frequency  
from one another. You can think of it as loosely analogous to a radio frequency  
or a channel. Many port numbers are reserved, by rule or by convention, for  
or a channel. Many port numbers are reserved, by rule or by convention, for  
Line 44: Line 50:


<!--T:9-->
<!--T:9-->
Certain commercially-licensed programs must connect to a license server machine  
Certain commercially licensed programs must connect to a license server machine  
somewhere on the internet via a predetermined port. If the compute node where  
somewhere on the Internet via a predetermined port. If the compute node where  
the program is running has no access to the internet, then a ''gateway server''
the program is running has no access to the Internet, then a <i>gateway server</i>
which does have access must be used to forward communications, on that port,  
which does have access must be used to forward communications on that port,  
from the compute node to the license server. To enable this one must set up  
from the compute node to the license server. To enable this, one must set up  
an ''SSH tunnel''. Such an arrangement is also called ''port forwarding''.
an <i>SSH tunnel</i>. Such an arrangement is also called <i>port forwarding</i>.


<!--T:10-->
<!--T:10-->
In most cases, creating an SSH tunnel in a batch job requires just two or  
In most cases, creating an SSH tunnel in a batch job requires only two or  
three commands in your job script. You will need the following information:
three commands in your job script. You will need the following information:


<!--T:11-->
<!--T:11-->
# The IP address, or the name, of the license server. Let's call this LICSERVER.
* The IP address or the name of the license server (here LICSERVER).
# The port number of the license service. Let's call this LICPORT.  
* The port number of the license service (here LICPORT).  


<!--T:12-->
<!--T:12-->
You should obtain this information from whoever maintains the license server.
You should obtain this information from whoever maintains the license server.
That server also must allow connections from the login nodes; for
That server also must allow connections from the login nodes; for
Niagara, the outgoing IP address will either be 142.150.188.131 or 142.150.188.132.
Niagara, the outgoing IP address will either be 142.1.174.227 or 142.1.174.228.


<!--T:13-->
<!--T:13-->
With this information, one can now setup the SSH tunnel.  For
With this information, one can now set up the SSH tunnel.  For
Graham, an alternative resolution is to request a firewall exception
Graham, an alternative solution is to request a firewall exception
for the license server LICSERVER and its specific port LICPORT.
for license server LICSERVER and its specific port LICPORT.


<!--T:14-->
<!--T:14-->
Line 73: Line 79:
to pick one of the login nodes (gra-login1, 2, ...). Let us call the
to pick one of the login nodes (gra-login1, 2, ...). Let us call the
gateway node GATEWAY. You also need to choose the port number on the
gateway node GATEWAY. You also need to choose the port number on the
compute node to use. Let's call the latter COMPUTEPORT.
compute node to use (here COMPUTEPORT).


<!--T:15-->
<!--T:15-->
The ssh command to issue in the job script is then:
The SSH command to issue in the job script is then:


<!--T:16-->
<!--T:16-->
<source lang="bash">
<source lang="bash">
ssh GATEWAY -L COMPUTEPORT:LICSERVER:LICPORT -n -N -f
ssh GATEWAY -L COMPUTEPORT:LICSERVER:LICPORT -N -f
</source>
</source>


<!--T:17-->
<!--T:17-->
In this command, the string following the -L parameter specifies the port forwarding information, the parameter -n prevents ssh to read input (it couldn't in a compute job anyway), the
In this command, the string following the -L parameter specifies the port forwarding information:
parameter -N tells ssh not to open a shell on the GATEWAY, and the
* -N tells SSH not to open a shell on the GATEWAY,
parameter -f tells ssh to run in the background, allowing the job
* -f and -N tell SSH not to open a shell and to run in the background, allowing the job script to continue on past this SSH command.
script to proceed past this ssh command.


<!--T:18-->
<!--T:18-->
A further command to add to the job script should tell the software
A further command to add to the job script should tell the software
that the license server is on port COMPUTEPORT on the server
that the license server is on port COMPUTEPORT on the server
'localhost'. Here, 'localhost' is not a placeholder, rather, it is the literal name
<i>localhost</i>. The term <i>localhost</i> is the standard name by which a computer refers to itself. It is to be taken literally and should not be replaced with your computer's name. Exactly how to inform your software to use this port on <i>localhost</i> will
to use - 'localhost' is a standard hostname pseudonym by which a
computer can refer to itself. Exactly how to informl your software to use this port on 'localhost' will
depend on the specific application and the type of license server,
depend on the specific application and the type of license server,
but often it is simply a matter of setting an environment variable in
but often it is simply a matter of setting an environment variable in
Line 107: Line 110:


<!--T:21-->
<!--T:21-->
The following job script sets up an ssh tunnel to contact a
The following job script sets up an SSH tunnel to contact licenseserver.institution.ca at port 9999.
license server licenseserver.institution.ca at port 9999:


<!--T:22-->
<!--T:22-->
Line 118: Line 120:


<!--T:23-->
<!--T:23-->
ssh nia-dm1 -L 9999:licenseserver.institution.ca:9999 -N -f
REMOTEHOST=licenseserver.institution.ca
export MLM_LICENSE_FILE=9999@localhost
REMOTEPORT=9999
LOCALHOST=localhost
for ((i=0; i<10; ++i)); do
  LOCALPORT=$(shuf -i 1024-65535 -n 1)
  ssh nia-gw -L $LOCALPORT:$REMOTEHOST:$REMOTEPORT -N -f && break
done || { echo "Giving up forwarding license port after $i attempts..."; exit 1; }
export MLM_LICENSE_FILE=$LOCALPORT@$LOCALHOST


<!--T:24-->
<!--T:24-->
Line 126: Line 134:
</source>
</source>


= Contacting a visualization, Jupyterhub, database or other server running on compute node= <!--T:25-->
= Connecting to a program running on a compute node= <!--T:25-->


<!--T:26-->
<!--T:26-->
SSH tunnelling can also be used in the context of Compute Canada to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a [[Jupyter | Jupyter notebook]] or [[Visualization|visualization software]] to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster. In case of connecting to a database server where the connection is possible though the head node only the SSH tunneling can be used to move an arbitrary port number of a compute network to head node of a cluster and bind it to the database server.  
SSH tunnelling can also be used in our context to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a [[Jupyter | Jupyter Notebook]] or [[Visualization|visualization software]] to be displayed transparently on the user's local workstation even while they are running on a cluster's compute node. When connecting to a database server where the connection is only possible through the head node, SSH tunnelling can be used to bind an external port to the database server.
 
== Example for a job == <!--T:27-->
<pre>
# License
export LM_PROJECT=
export CDLMD_LICENSE_FILE=1999@localhost
 
<!--T:28-->
# Start the SSH tunnel
ssh -n -N -L 1999:flex.cd-adapco.com:1999 gra-login1 &
SSH1=$!
ssh -n -N -L 2099:flex.cd-adapco.com:2099 gra-login1 &
SSH2=$!
 
<!--T:29-->
# Launch the code
<whatever>
 
<!--T:30-->
# Stop the SSH tunnel
kill -9 $SSH1
kill -9 $SSH2
</pre>


<!--T:31-->
<!--T:32-->
There is NAT on both Graham and Cedar allowing users to access the Internet from the compute nodes. On Graham, however, access is blocked by default at the firewall. A user (or an analyst) would need to submit submit a request to have a specific port/IP open.
There is Network Address Translation (NAT) on both Graham and Cedar allowing users to access the Internet from the compute nodes. On Graham however, access is blocked by default at the firewall. Contact [[Technical support|technical support]] if you need to have a specific port opened, supplying the IP address or range of addresses which should be allowed to use that port.


== From Linux or MacOS X == <!--T:32-->
== From Linux or MacOS X == <!--T:51-->


<!--T:33-->
<!--T:52-->
On a Linux or MacOS X system, we recommend using the [https://sshuttle.readthedocs.io sshuttle] Python package.
On a Linux or MacOS X system, we recommend using the [https://sshuttle.readthedocs.io sshuttle] Python package.


<!--T:34-->
<!--T:34-->
On your computer, open a new terminal window and run the following sshuttle command to create the tunnel.
On your computer, open a new terminal window and run the following <code>sshuttle</code> command to create the tunnel.


<!--T:35-->
<!--T:35-->
Line 171: Line 156:


<!--T:36-->
<!--T:36-->
Then, copy and paste the provided URL into your browser. In the above example, this would be
Then, copy and paste the application's URL into your browser. If your application is a
[[Jupyter#Starting_Jupyter_Notebook|Jupyter notebook]], for example, you are given a URL with a token:
<pre>
<pre>
  http://cdr544.int.cedar.computecanada.ca:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
  http://cdr544.int.cedar.computecanada.ca:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
Line 179: Line 165:


<!--T:38-->
<!--T:38-->
An SSH tunnel can be created from Windows using [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm] as follows.
An SSH tunnel can be created from Windows using [[Connecting with MobaXTerm|MobaXTerm]] as follows.


<!--T:39-->
<!--T:39-->
Line 185: Line 171:


<!--T:40-->
<!--T:40-->
*Session 1 should be a connection to a cluster. Follow the instructions in section ''Starting Jupyter Notebook''.
*Session 1 should be a connection to a cluster. Start your job there following the instructions for your application, such as [[Jupyter#Starting_Jupyter_Notebook|Jupyter Notebook]]. You should be given a URL that includes a host name and a port, such as <code>cdr544.int.cedar.computecanada.ca:8888</code> for example.


<!--T:41-->
<!--T:41-->
*Session 2 should be a local terminal in which we will set up the SSH tunnel. Run the following command, substituting the node name from the URL you received in Session 1. Follow the instructions in section ''Starting Jupyter Notebook''.
*Session 2 should be a local terminal in which we will set up the SSH tunnel. Run the following command, replacing this example host name with the one from the URL you received in Session 1.  


<!--T:42-->
<!--T:42-->
Line 196: Line 182:


<!--T:43-->
<!--T:43-->
This command performs local port forwarding (-L). It forwards local port 8888 to <code>cdr544.int.cedar.computecanada.ca:8888</code>, which is the host name given when Jupyter Notebook was started.  
This command forwards connections to <b>local port</b/> 8888 to port 8888 on cdr544.int.cedar.computecanada.ca, the <b>remote port</b>.
The local port number, the first one, does not <i>need</i> to match the remote port number, the second one, but it is conventional and reduces confusion.


<!--T:44-->
<!--T:44-->
Open your browser and go to
Modify the URL you were given in Session 1 by replacing the host name with <code>localhost</code>.
Again using an example from [[Jupyter#Starting_Jupyter_Notebook|Jupyter Notebook]], this would be the URL to paste into a browser:
<pre>
<pre>
  http://localhost:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
  http://localhost:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca
</pre>
</pre>


<!--T:45-->
== Example for connecting to a database server on Cedar from your desktop == <!--T:46-->
Replace the token in this example with the one given to you in Session 1. You can also type <code>http://localhost:8888</code> and there will be a prompt asking you for the token, which you can then copy and paste.


== Example for connecting to a database server on cedar from your desktop == <!--T:46-->
<!--T:55-->
An SSH tunnel can be created from your desktop to database servers PostgreSQL or MySQL using the following commands respectively:


<!--T:47-->
<!--T:47-->
Commands to connect to PostgreSQL and MySQL respectively are:
<pre>  
<pre>  
ssh -2 -L 127.0.0.1:PORT:cedar-pgsql-vm.int.cedar.computecanada.ca:5432 someuser@cedar.computecanada.ca
ssh -L PORT:cedar-pgsql-vm.int.cedar.computecanada.ca:5432 someuser@cedar.computecanada.ca
ssh -2 -L 127.0.0.1:PORT:cedar-mysql-vm.int.cedar.computecanada.ca:3306 someuser@cedar.computecanada.ca
ssh -L PORT:cedar-mysql-vm.int.cedar.computecanada.ca:3306 someuser@cedar.computecanada.ca
</pre>
</pre>


<!--T:48-->
<!--T:48-->
These commands move your localhost:PORT to cedar.computecanada.ca:PORT and bind it with cedar-pgsql-vm.int.cedar.computecanada.ca:5432. "someuser" in this example is your username on computecanada. By running one of these commands you will be connected to cedar (like any other ssh connection). The only difference between this connection and an ordinary ssh connection is that you can now use another terminal to connect to the database server directly from your desktop. Here are commands for PostgreSQL and MySQL connection respectively:
These commands connect port number PORT on your local host to PostgreSQL or MySQL database servers respectively. The port number you choose (PORT) should not be bigger than 32768 (2^15). In this example, <i>someuser</i> is your account username. The difference between this connection and an ordinary SSH connection is that you can now use another terminal to connect to the database server directly from your desktop. On your desktop, run one of these commands for PostgreSQL or MySQL as appropriate:


<!--T:49-->
<!--T:49-->
<pre>  
<pre>  
psql -h 127.0.0.1 -P PORT -U <your username> -W
psql -h 127.0.0.1 -p PORT -U <your username> -d <your database>
mysql -h 127.0.0.1 -P PORT -u <your username> -p
mysql -h 127.0.0.1 -P PORT -u <your username> --protocol=TCP -p  
</pre>
</pre>


<!--T:50-->
<!--T:50-->
The connection requires a password for both MySQL and PostgreSQL. However, for PostgreSQL the password is your computecanada password and for MySQL the password is stored in your ".my.cnf" located in your home directory on cedar. The connections will remain open as long as your have the ssh connection. In this example "PORT" is an arbitrary number and it should be opened in firewall of cedar head node. So, please before running this command send a request to support@computecanada.ca and we will assign a port number for you.
MySQL requires a password; it is stored in your <i>.my.cnf</i> located in your home directory on Cedar.  
The database connection will remain open as long as the SSH connection remains open.


</translate>
</translate>

Latest revision as of 20:47, 25 March 2024

Other languages:

Parent page: SSH

What is SSH tunnelling?

SSH tunnelling is a method to use a gateway computer to connect two computers that cannot connect directly.

In the context of the Alliance, SSH tunnelling is necessary in certain cases, because compute nodes on Niagara, Béluga and Graham do not have direct access to the Internet, nor can the compute nodes be contacted directly from the Internet.

The following use cases require SSH tunnels:

  • Running commercial software on a compute node that needs to contact a license server over the Internet;
  • Running visualization software on a compute node that needs to be contacted by client software on a user's local computer;
  • Running a Jupyter Notebook on a compute node that needs to be contacted by the Web browser on a user's local computer;
  • Connecting to the Cedar database server from somewhere other than the Cedar head node, e.g., your desktop.

In the first case, the license server is outside of the compute cluster and is rarely under a user's control, whereas in the other cases, the server is on the compute node but the challenge is to connect to it from the outside. We will therefore consider these two situations below.

While not strictly required to use SSH tunnelling, you may wish to be familiar with SSH key pairs.

Contacting a license server from a compute node

What's a port?

A port is a number used to distinguish streams of communication from one another. You can think of it as loosely analogous to a radio frequency or a channel. Many port numbers are reserved, by rule or by convention, for certain types of traffic. See List of TCP and UDP port numbers for more.


Certain commercially licensed programs must connect to a license server machine somewhere on the Internet via a predetermined port. If the compute node where the program is running has no access to the Internet, then a gateway server which does have access must be used to forward communications on that port, from the compute node to the license server. To enable this, one must set up an SSH tunnel. Such an arrangement is also called port forwarding.

In most cases, creating an SSH tunnel in a batch job requires only two or three commands in your job script. You will need the following information:

  • The IP address or the name of the license server (here LICSERVER).
  • The port number of the license service (here LICPORT).

You should obtain this information from whoever maintains the license server. That server also must allow connections from the login nodes; for Niagara, the outgoing IP address will either be 142.1.174.227 or 142.1.174.228.

With this information, one can now set up the SSH tunnel. For Graham, an alternative solution is to request a firewall exception for license server LICSERVER and its specific port LICPORT.

The gateway server on Niagara is nia-gw. On Graham, you need to pick one of the login nodes (gra-login1, 2, ...). Let us call the gateway node GATEWAY. You also need to choose the port number on the compute node to use (here COMPUTEPORT).

The SSH command to issue in the job script is then:

ssh GATEWAY -L COMPUTEPORT:LICSERVER:LICPORT -N -f

In this command, the string following the -L parameter specifies the port forwarding information:

  • -N tells SSH not to open a shell on the GATEWAY,
  • -f and -N tell SSH not to open a shell and to run in the background, allowing the job script to continue on past this SSH command.

A further command to add to the job script should tell the software that the license server is on port COMPUTEPORT on the server localhost. The term localhost is the standard name by which a computer refers to itself. It is to be taken literally and should not be replaced with your computer's name. Exactly how to inform your software to use this port on localhost will depend on the specific application and the type of license server, but often it is simply a matter of setting an environment variable in the job script like

export MLM_LICENSE_FILE=COMPUTEPORT@localhost

Example job script

The following job script sets up an SSH tunnel to contact licenseserver.institution.ca at port 9999.

#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 40
#SBATCH --time 3:00:00

REMOTEHOST=licenseserver.institution.ca
REMOTEPORT=9999
LOCALHOST=localhost
for ((i=0; i<10; ++i)); do
  LOCALPORT=$(shuf -i 1024-65535 -n 1)
  ssh nia-gw -L $LOCALPORT:$REMOTEHOST:$REMOTEPORT -N -f && break
done || { echo "Giving up forwarding license port after $i attempts..."; exit 1; }
export MLM_LICENSE_FILE=$LOCALPORT@$LOCALHOST

module load thesoftware/2.0
mpirun thesoftware .....

Connecting to a program running on a compute node

SSH tunnelling can also be used in our context to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a Jupyter Notebook or visualization software to be displayed transparently on the user's local workstation even while they are running on a cluster's compute node. When connecting to a database server where the connection is only possible through the head node, SSH tunnelling can be used to bind an external port to the database server.

There is Network Address Translation (NAT) on both Graham and Cedar allowing users to access the Internet from the compute nodes. On Graham however, access is blocked by default at the firewall. Contact technical support if you need to have a specific port opened, supplying the IP address or range of addresses which should be allowed to use that port.

From Linux or MacOS X

On a Linux or MacOS X system, we recommend using the sshuttle Python package.

On your computer, open a new terminal window and run the following sshuttle command to create the tunnel.

Question.png
[name@my_computer $] sshuttle --dns -Nr userid@machine_name

Then, copy and paste the application's URL into your browser. If your application is a Jupyter notebook, for example, you are given a URL with a token:

 http://cdr544.int.cedar.computecanada.ca:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca

From Windows

An SSH tunnel can be created from Windows using MobaXTerm as follows.

Open two sessions in MobaXTerm.

  • Session 1 should be a connection to a cluster. Start your job there following the instructions for your application, such as Jupyter Notebook. You should be given a URL that includes a host name and a port, such as cdr544.int.cedar.computecanada.ca:8888 for example.
  • Session 2 should be a local terminal in which we will set up the SSH tunnel. Run the following command, replacing this example host name with the one from the URL you received in Session 1.
Question.png
[name@my_computer ]$  ssh -L 8888:cdr544.int.cedar.computecanada.ca:8888 someuser@cedar.computecanada.ca

This command forwards connections to local port 8888 to port 8888 on cdr544.int.cedar.computecanada.ca, the remote port. The local port number, the first one, does not need to match the remote port number, the second one, but it is conventional and reduces confusion.

Modify the URL you were given in Session 1 by replacing the host name with localhost. Again using an example from Jupyter Notebook, this would be the URL to paste into a browser:

 http://localhost:8888/?token=7ed7059fad64446f837567e32af8d20efa72e72476eb72ca

Example for connecting to a database server on Cedar from your desktop

An SSH tunnel can be created from your desktop to database servers PostgreSQL or MySQL using the following commands respectively:

 
ssh -L PORT:cedar-pgsql-vm.int.cedar.computecanada.ca:5432 someuser@cedar.computecanada.ca
ssh -L PORT:cedar-mysql-vm.int.cedar.computecanada.ca:3306 someuser@cedar.computecanada.ca

These commands connect port number PORT on your local host to PostgreSQL or MySQL database servers respectively. The port number you choose (PORT) should not be bigger than 32768 (2^15). In this example, someuser is your account username. The difference between this connection and an ordinary SSH connection is that you can now use another terminal to connect to the database server directly from your desktop. On your desktop, run one of these commands for PostgreSQL or MySQL as appropriate:

 
psql -h 127.0.0.1 -p PORT -U <your username> -d <your database>
mysql -h 127.0.0.1 -P PORT -u <your username> --protocol=TCP -p 

MySQL requires a password; it is stored in your .my.cnf located in your home directory on Cedar. The database connection will remain open as long as the SSH connection remains open.