https://docs.alliancecan.ca/mediawiki/api.php?action=feedcontributions&user=Willis2&feedformat=atomAlliance Doc - User contributions [en]2024-03-28T21:31:44ZUser contributionsMediaWiki 1.39.6https://docs.alliancecan.ca/mediawiki/index.php?title=Niagara&diff=113397Niagara2022-03-29T19:02:03Z<p>Willis2: </p>
<hr />
<div><languages /><br />
<br />
<translate><br />
<!--T:1--><br />
{| class="wikitable"<br />
|-<br />
| Availability: In production since April 2018<br />
|-<br />
| Login node: '''niagara.computecanada.ca'''<br />
|-<br />
| Globus endpoint: '''computecanada#niagara'''<br />
|-<br />
| Data mover nodes (rsync, scp, ...): '''nia-dm2, nia-dm2''', see [[Niagara_Quickstart#Moving_data|Moving data]]<br />
|-<br />
| System Status Page: '''https://docs.scinet.utoronto.ca'''<br />
|}<br />
<br />
<!--T:2--><br />
Niagara is a homogeneous cluster, owned by the [https://www.utoronto.ca/ University of Toronto] and operated by [https://www.scinethpc.ca/ SciNet], intended to enable large parallel jobs of 1040 cores and more. It was designed to optimize throughput of a range of<br />
scientific codes running at scale, energy efficiency, and network and storage performance and capacity. <br />
<br />
<!--T:4--><br />
The [[Niagara Quickstart]] has specific instructions for Niagara, where the user experience on Niagara is similar to that on Graham<br />
and Cedar, but slightly different. <br />
<br />
<!--T:29--><br />
Preliminary documentation about the GPU expansion to Niagara called "[https://docs.scinet.utoronto.ca/index.php/Mist Mist]" can be found on [https://docs.scinet.utoronto.ca/index.php/Mist the SciNet documentation site].<br />
<br />
<!--T:5--><br />
Niagara is an allocatable resource in the 2018 [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resource Allocation Competition] (RAC 2018), which has come into effect on April 4, 2018. <br />
<br />
<!--T:6--><br />
[https://youtu.be/EpIcl-iUCV8 Niagara installation update at the SciNet User Group Meeting on February 14th, 2018]<br />
<br />
<!--T:7--><br />
[https://www.youtube.com/watch?v=RgSvGGzTeoc Niagara installation time-lag video]<br />
<br />
<br />
=Niagara hardware specifications= <!--T:3--><br />
<br />
<!--T:8--><br />
* 2024 nodes, each with 40 Intel "Skylake" cores at 2.4 GHz or 40 Intel "CascadeLake" cores at 2.5 GHz, for a total of 80,640 cores.<br />
* 202 GB (188 GiB) of RAM per node.<br />
* EDR Infiniband network in a so-called 'Dragonfly+' topology.<br />
* 12.5PB of scratch, 3.5PB of project space (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).<br />
* 256 TB burst buffer (Excelero + IBM Spectrum Scale).<br />
* No local disks.<br />
* No GPUs.<br />
* Theoretical peak performance ("Rpeak") of 6.25 PF.<br />
* Measured delivered performance ("Rmax") of 3.6 PF.<br />
* 920 kW power consumption.<br />
<br />
=Attached storage systems= <!--T:9--><br />
{| class="wikitable sortable"<br />
|-<br />
| '''Home''' <br>200TB<br>Parallel high-performance filesystem (IBM Spectrum Scale) ||<br />
* Backed up to tape<br />
* Persistent<br />
|-<br />
| '''Scratch'''<br>12.5PB (~100GB/s Write, ~120GB/s Read)<br>Parallel high-performance filesystem (IBM Spectrum Scale)||<br />
* Inactive data is purged.<br />
|-<br />
| '''Burst buffer'''<br>232TB (~90GB/s Write , ~154 GB/s Read)<br>Parallel extra high-performance filesystem (Excelero+IBM Spectrum Scale)||<br />
* Inactive data is purged.<br />
|-<br />
|'''Project'''<br >3.5PB (~100GB/s Write, ~120GB/s Read)<br>Parallel high-performance filesystem (IBM Spectrum Scale||<br />
* Backed up to tape<br />
* Allocated through [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC]<br />
* Persistent<br />
|-<br />
| '''Archive'''<br />20PB<br />High Performance Storage System (IBM HPSS)||<br />
* tape-backed HSM<br />
* Allocated through [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC]<br />
* Persistent<br />
|}<br />
<br />
=High-performance interconnect= <!--T:10--><br />
<br />
<!--T:11--><br />
The Niagara cluster has an EDR Infiniband network in a so-called<br />
'Dragonfly+' topology, with four wings. Each wing of maximually 432 nodes (i.e., 17280) has<br />
1-to-1 connections. Network traffic between wings is done through<br />
adaptive routing, which alleviates network congestion and yields an effective blocking of 2:1 between nodes of different wings.<br />
<br />
=Node characteristics= <!--T:12--><br />
<br />
<!--T:13--><br />
* CPU: 2 sockets with 20 Intel Skylake cores (2.4GHz, AVX512), for a total of 40 cores per node<br />
* Computational performance: 3.07 TFlops theoretical peak. <br />
* Network connection: 100Gb/s EDR Dragonfly+<br />
* Memory: 202 GB (188 GiB) of RAM, i.e., a bit over 4GiB per core.<br />
* Local disk: none. GPUs/Accelerators: none.<br />
* Operating system: Linux CentOS 7<br />
<br />
=Scheduling= <!--T:14--><br />
<br />
<!--T:15--><br />
The Niagara cluster uses the [[Running jobs|Slurm]] scheduler to run jobs. The basic scheduling commands are therefore similar to those for Cedar and Graham, with a few differences:<br />
<br />
<!--T:16--><br />
* Scheduling is by node only. This means jobs always need to use multiples of 40 cores per job.<br />
* Asking for specific amounts of memory is not be necessary and is discouraged; all nodes have the same amount of memory (202GB/188GiB minus some operating system overhead).<br />
<br />
<!--T:17--><br />
Details, such as how to request burst buffer usage in jobs, are still being worked out.<br />
<br />
=Software= <!--T:18--><br />
<br />
<!--T:19--><br />
* Module-based software stack.<br />
* Both the standard Compute Canada software stack as well as cluster-specific software tuned for Niagara are available.<br />
* In contrast with Cedar and Graham, no modules are loaded by default to prevent accidental conflicts in versions. To load the software stack that a user would see on Graham and Cedar, one can load the "CCEnv" module (see [[Niagara Quickstart]]).<br />
<br />
= Access to Niagara = <!--T:20--><br />
Access to Niagara is not enabled automatically for everyone with a Compute Canada account, but anyone with an active Compute Canada account can get their access enabled.<br />
<br />
If you have an active Compute Canada account but you do not have access to Niagara yet (e.g. because you are a new user and belong to a group whose primary PI does not have an allocation as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button on that page, it usually takes only one or two business days for access to be granted. <br />
<br />
<!--T:27--><br />
If at any time you require assistance, please do not hesitate to [mailto:niagara@computecanada.ca contact us].<br />
<br />
== Getting started == <!--T:25--><br />
<br />
<!--T:26--><br />
Please read the [[Niagara Quickstart]] carefully. <br />
<br />
<!--T:28--><br />
[[Category:Pages with video links]]<br />
</translate></div>Willis2https://docs.alliancecan.ca/mediawiki/index.php?title=Transferring_data&diff=111167Transferring data2022-02-02T18:40:20Z<p>Willis2: Marked this version for translation</p>
<hr />
<div><languages /><br />
<br />
<translate><br />
<!--T:30--><br />
Please use ''data transfer nodes'', also called ''data mover nodes'', instead of login nodes whenever you are transferring data to and from Compute Canada clusters. If a data transfer node is available, its URL will be given near the top of the main page for each cluster: [[Béluga/en|Béluga]], [[Cedar]], [[Graham]], [[Niagara]].<br />
<br />
<!--T:31--><br />
[[Globus]] automatically uses data transfer nodes.<br />
<br />
==To and from your personal computer== <!--T:1--><br />
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The commands <code>scp</code> and <code>sftp</code> can be used in a command-line environment on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/en MobaXterm] offers both a graphical file transfer function and a [[Linux introduction|command-line]] interface via [[SSH]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. Setting up a connection to a Compute Canada machine using SSH keys with WinSCP can be done by following the steps in this [https://www.exavault.com/blog/import-ssh-keys-winscp link]. [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY/en PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.<br />
<br />
<!--T:2--><br />
If it takes more than one minute to move your files to or from Compute Canada servers, we recommend you install and try [[Globus#Personal_Computers|Globus Personal Connect]]. [[Globus]] transfers can be set up and will go on in the background without you.<br />
<br />
==Between Compute Canada resources== <!--T:3--><br />
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.<br />
<br />
<!--T:4--><br />
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including<br />
* [[Transferring_data#SFTP | SFTP]]<br />
* [[Transferring_data#SCP | SCP]] or Secure Copy<br />
* [[Transferring_data#Rsync | rsync]]<br />
<br />
<!--T:35--><br />
Note: If you want to transfer files between other Compute Canada clusters and Niagara use the SSH agent forwarding flag, <code>-A</code> when logging into another cluster. For example, to copy files to Niagara from Cedar use:<br />
<br />
<!--T:36--><br />
<pre><br />
ssh -A USERNAME@cedar.computecanada.ca<br />
</pre><br />
then perform the copy:<br />
<pre><br />
[USERNAME@cedar5 ~]$ scp file USERNAME@niagara.computecanada.ca:/scratch/g/group/USERNAME/<br />
</pre><br />
<br />
==From the World Wide Web== <!--T:5--><br />
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget]. Also available is [https://curl.haxx.se/ curl]. The two are compared in this [https://unix.stackexchange.com/questions/47434/what-is-the-difference-between-curl-and-wget StackExchange article].<br />
<br />
==Synchronizing files== <!--T:6--><br />
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.<br />
<br />
===Globus transfer=== <!--T:7--><br />
We find Globus usually gives the best performance and reliability.<br />
<br />
<!--T:8--><br />
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means all of the files on the source will be transferred. If some of the files may already exist on the destination and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.<br />
<br />
<!--T:9--><br />
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]<br />
<br />
<!--T:10--><br />
You may choose how Globus decides which files to transfer:<br />
{| class="wikitable"<br />
|-<br />
| Their checksums are different || This is the slowest option but most accurate. This will catch changes or errors that result in the same size of file, but with different contents.<br />
|-<br />
| File doesn't exist on destination || This will only transfer files that have been created since the last sync. Useful if you are incrementally creating files.<br />
|-<br />
| File size is different || A quick test. If the file size has changed then its contents must have changed, and it will be re-transferred.<br />
|-<br />
| Modification time is newer || This will check the file's recorded modification time and only transfer the file if it is newer on the source than the destination. If you want to depend on this it is important to check the "preserve source file modification times" option when initiating a Globus transfer.<br />
|}<br />
<br />
<!--T:11--><br />
For more information about Globus please see [[Globus]].<br />
<br />
<br clear="all"/><br />
===Rsync=== <!--T:12--><br />
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running <code>rsync</code> will check the modification time and size of each file, and will only transfer the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the <code>-c</code> option, which will compute checksums at the source and destination, and transfer only if the checksums do not match. <br />
<br />
<!--T:26--><br />
When transferring files into the <code>/project</code> file systems, do not use <code>-p</code> and <code>-g</code> flags (or <code>-a</code>, which implies those two). The quotas in <code>/project</code> are enforced based on group ownership, and thus preserving the group ownership will lead to the [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded]] error message.<br />
<br />
<!--T:32--><br />
If you are using <code>-a</code> when transferring files into the <code>/project</code> file systems, you can add <code>--no-g --no-p</code> to your options, like so<br />
{{Command|rsync -avz --no-g --no-p LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
or avoid using <code>-a</code> altogether<br />
{{Command|rsync -rltv LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
where LOCALNAME can be a folder or file. For large transfers consider adding --partial so interrupted transfers maybe restarted and/or --progress to see a summary of the transfer progress.<br />
<br />
===Using checksums to check if files match=== <!--T:13--><br />
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.<br />
<br />
<!--T:14--><br />
{{Command<br />
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log<br />
}}<br />
<br />
<!--T:15--><br />
This command will create a new file called checksum-result.log in the current directory; the file will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.<br />
<br />
<!--T:16--><br />
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.<br />
<br />
<!--T:17--><br />
{{Command<br />
|diff checksum-result-silo.log checksum-dtn.log<br />
|result=69c69<br />
< 017f14f6a1a194a5f791836d93d14beead0a5115 /home/username/file-0025048576-0000008<br />
---<br />
> 8836913c2cc2272c017d0455f70cf0d698daadb3 /home/username/file-0025048576-0000008<br />
}}<br />
<br />
<!--T:18--><br />
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:<br />
<br />
<!--T:19--><br />
{{Commands<br />
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log<br />
|sort -k2 checksum-dtn.log -o checksum-dtn.log<br />
}}<br />
<br />
==SFTP== <!--T:21--><br />
[https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] (Secure File Transfer Protocol) uses the SSH protocol to transfer files between machines which encrypts data being transferred.<br />
<br />
<!--T:22--><br />
For example you can connect to a remote machine at <code>ADDRESS</code> as user <code>USERNAME</code> with SFTP to transfer files like so:<br />
<br />
<!--T:23--><br />
<source lang="console"><br />
[name@server]$ sftp USERNAME@ADDRESS<br />
The authenticity of host 'ADDRESS (###.###.###.##)' can't be established.<br />
RSA key fingerprint is ##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##.<br />
Are you sure you want to continue connecting (yes/no)? yes<br />
Warning: Permanently added 'ADDRESS,###.###.###.##' (RSA) to the list of known hosts.<br />
USERNAME@ADDRESS's password:<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
or using an [[SSH Keys|SSH Key]] for authentication using the <code>-i</code> option<br />
<source lang="console"><br />
[name@server]$ sftp -i /home/name/.ssh/id_rsa USERNAME@ADDRESS<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
<br />
<!--T:24--><br />
which returns the <code>sftp></code> prompt where commands to transfer files can be issued. To get a list of commands available to use at the sftp prompt enter the <code>help</code> command.<br />
<br />
<!--T:25--><br />
There are also a number of graphical programs available for Windows, Linux and Mac OS, such as [https://winscp.net/eng/index.php WinSCP] and [http://mobaxterm.mobatek.net/ MobaXterm] (Windows), [https://filezilla-project.org filezilla] (Windows,Mac, and Linux), and [https://cyberduck.io/?l=en cyberduck] (Mac and Windows).<br />
[[Category:Connecting]]<br />
<br />
==SCP== <!--T:27--> <br />
<br />
<!--T:28--><br />
SCP stands for [https://en.wikipedia.org/wiki/Secure_copy "Secure Copy"]. Like SFTP it uses the SSH protocol to encrypt data being transferred. It does not support synchronization like [[Globus]] or [[Transferring_data#Rsync|rsync]]. Some examples of the most common use of SCP include <br />
{{Command<br />
|scp foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
which will copy the file <tt>foo.txt</tt> from the current directory on my local computer to the directory <tt>$HOME/work</tt> on the cluster [[Béluga/en|Béluga]]. To copy a file, <tt>output.dat</tt> from my project space on the cluster [[Cedar]] to my local computer I can use a command like<br />
{{Command<br />
|scp username@cedar.computecanada.ca:projects/def-jdoe/username/results/output.dat .<br />
}}<br />
Many other examples of the use of SCP are shown [http://www.hypexr.org/linux_scp_help.php here]. Note that you always execute this <tt>scp</tt> command on your local computer, not the remote cluster - the SCP connection, regardless of whether you are transferring data to or from the remote cluster, should always be initiated from your local computer. <br />
<br />
<!--T:29--><br />
SCP supports an option, <code>-r</code>, to recursively transfer a set of directories and files. We '''recommend against using <code>scp -r</code>''' to transfer data into <code>/project</code> because the setgid bit is turned off in the created directories, which may lead to <code>Disk quota exceeded</code> or similar errors if files are later created there (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]).<br />
<br />
<!--T:33--><br />
'''<big>***Note***</big>''' if you chose a custom SSH key name, <i>i.e.</i> something other than the default names: <code>id_dsa</code>, <code>id_ecdsa</code>, <code>id_ed25519</code> and <code>id_rsa</code>, you will need to use the <code>-i</code> option of scp and specify the path to your private key before the file paths via:<br />
<br />
<!--T:34--><br />
{{Command<br />
|scp -i /path/to/key foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
<br />
<!--T:20--><br />
[[Category:Connecting]]<br />
</translate></div>Willis2https://docs.alliancecan.ca/mediawiki/index.php?title=Transferring_data&diff=111166Transferring data2022-02-02T18:40:00Z<p>Willis2: </p>
<hr />
<div><languages /><br />
<br />
<translate><br />
<!--T:30--><br />
Please use ''data transfer nodes'', also called ''data mover nodes'', instead of login nodes whenever you are transferring data to and from Compute Canada clusters. If a data transfer node is available, its URL will be given near the top of the main page for each cluster: [[Béluga/en|Béluga]], [[Cedar]], [[Graham]], [[Niagara]].<br />
<br />
<!--T:31--><br />
[[Globus]] automatically uses data transfer nodes.<br />
<br />
==To and from your personal computer== <!--T:1--><br />
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The commands <code>scp</code> and <code>sftp</code> can be used in a command-line environment on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/en MobaXterm] offers both a graphical file transfer function and a [[Linux introduction|command-line]] interface via [[SSH]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. Setting up a connection to a Compute Canada machine using SSH keys with WinSCP can be done by following the steps in this [https://www.exavault.com/blog/import-ssh-keys-winscp link]. [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY/en PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.<br />
<br />
<!--T:2--><br />
If it takes more than one minute to move your files to or from Compute Canada servers, we recommend you install and try [[Globus#Personal_Computers|Globus Personal Connect]]. [[Globus]] transfers can be set up and will go on in the background without you.<br />
<br />
==Between Compute Canada resources== <!--T:3--><br />
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.<br />
<br />
<!--T:4--><br />
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including<br />
* [[Transferring_data#SFTP | SFTP]]<br />
* [[Transferring_data#SCP | SCP]] or Secure Copy<br />
* [[Transferring_data#Rsync | rsync]]<br />
<br />
Note: If you want to transfer files between other Compute Canada clusters and Niagara use the SSH agent forwarding flag, <code>-A</code> when logging into another cluster. For example, to copy files to Niagara from Cedar use:<br />
<br />
<pre><br />
ssh -A USERNAME@cedar.computecanada.ca<br />
</pre><br />
then perform the copy:<br />
<pre><br />
[USERNAME@cedar5 ~]$ scp file USERNAME@niagara.computecanada.ca:/scratch/g/group/USERNAME/<br />
</pre><br />
<br />
==From the World Wide Web== <!--T:5--><br />
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget]. Also available is [https://curl.haxx.se/ curl]. The two are compared in this [https://unix.stackexchange.com/questions/47434/what-is-the-difference-between-curl-and-wget StackExchange article].<br />
<br />
==Synchronizing files== <!--T:6--><br />
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.<br />
<br />
===Globus transfer=== <!--T:7--><br />
We find Globus usually gives the best performance and reliability.<br />
<br />
<!--T:8--><br />
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means all of the files on the source will be transferred. If some of the files may already exist on the destination and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.<br />
<br />
<!--T:9--><br />
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]<br />
<br />
<!--T:10--><br />
You may choose how Globus decides which files to transfer:<br />
{| class="wikitable"<br />
|-<br />
| Their checksums are different || This is the slowest option but most accurate. This will catch changes or errors that result in the same size of file, but with different contents.<br />
|-<br />
| File doesn't exist on destination || This will only transfer files that have been created since the last sync. Useful if you are incrementally creating files.<br />
|-<br />
| File size is different || A quick test. If the file size has changed then its contents must have changed, and it will be re-transferred.<br />
|-<br />
| Modification time is newer || This will check the file's recorded modification time and only transfer the file if it is newer on the source than the destination. If you want to depend on this it is important to check the "preserve source file modification times" option when initiating a Globus transfer.<br />
|}<br />
<br />
<!--T:11--><br />
For more information about Globus please see [[Globus]].<br />
<br />
<br clear="all"/><br />
===Rsync=== <!--T:12--><br />
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running <code>rsync</code> will check the modification time and size of each file, and will only transfer the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the <code>-c</code> option, which will compute checksums at the source and destination, and transfer only if the checksums do not match. <br />
<br />
<!--T:26--><br />
When transferring files into the <code>/project</code> file systems, do not use <code>-p</code> and <code>-g</code> flags (or <code>-a</code>, which implies those two). The quotas in <code>/project</code> are enforced based on group ownership, and thus preserving the group ownership will lead to the [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded]] error message.<br />
<br />
<!--T:32--><br />
If you are using <code>-a</code> when transferring files into the <code>/project</code> file systems, you can add <code>--no-g --no-p</code> to your options, like so<br />
{{Command|rsync -avz --no-g --no-p LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
or avoid using <code>-a</code> altogether<br />
{{Command|rsync -rltv LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
where LOCALNAME can be a folder or file. For large transfers consider adding --partial so interrupted transfers maybe restarted and/or --progress to see a summary of the transfer progress.<br />
<br />
===Using checksums to check if files match=== <!--T:13--><br />
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.<br />
<br />
<!--T:14--><br />
{{Command<br />
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log<br />
}}<br />
<br />
<!--T:15--><br />
This command will create a new file called checksum-result.log in the current directory; the file will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.<br />
<br />
<!--T:16--><br />
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.<br />
<br />
<!--T:17--><br />
{{Command<br />
|diff checksum-result-silo.log checksum-dtn.log<br />
|result=69c69<br />
< 017f14f6a1a194a5f791836d93d14beead0a5115 /home/username/file-0025048576-0000008<br />
---<br />
> 8836913c2cc2272c017d0455f70cf0d698daadb3 /home/username/file-0025048576-0000008<br />
}}<br />
<br />
<!--T:18--><br />
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:<br />
<br />
<!--T:19--><br />
{{Commands<br />
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log<br />
|sort -k2 checksum-dtn.log -o checksum-dtn.log<br />
}}<br />
<br />
==SFTP== <!--T:21--><br />
[https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] (Secure File Transfer Protocol) uses the SSH protocol to transfer files between machines which encrypts data being transferred.<br />
<br />
<!--T:22--><br />
For example you can connect to a remote machine at <code>ADDRESS</code> as user <code>USERNAME</code> with SFTP to transfer files like so:<br />
<br />
<!--T:23--><br />
<source lang="console"><br />
[name@server]$ sftp USERNAME@ADDRESS<br />
The authenticity of host 'ADDRESS (###.###.###.##)' can't be established.<br />
RSA key fingerprint is ##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##.<br />
Are you sure you want to continue connecting (yes/no)? yes<br />
Warning: Permanently added 'ADDRESS,###.###.###.##' (RSA) to the list of known hosts.<br />
USERNAME@ADDRESS's password:<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
or using an [[SSH Keys|SSH Key]] for authentication using the <code>-i</code> option<br />
<source lang="console"><br />
[name@server]$ sftp -i /home/name/.ssh/id_rsa USERNAME@ADDRESS<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
<br />
<!--T:24--><br />
which returns the <code>sftp></code> prompt where commands to transfer files can be issued. To get a list of commands available to use at the sftp prompt enter the <code>help</code> command.<br />
<br />
<!--T:25--><br />
There are also a number of graphical programs available for Windows, Linux and Mac OS, such as [https://winscp.net/eng/index.php WinSCP] and [http://mobaxterm.mobatek.net/ MobaXterm] (Windows), [https://filezilla-project.org filezilla] (Windows,Mac, and Linux), and [https://cyberduck.io/?l=en cyberduck] (Mac and Windows).<br />
[[Category:Connecting]]<br />
<br />
==SCP== <!--T:27--> <br />
<br />
<!--T:28--><br />
SCP stands for [https://en.wikipedia.org/wiki/Secure_copy "Secure Copy"]. Like SFTP it uses the SSH protocol to encrypt data being transferred. It does not support synchronization like [[Globus]] or [[Transferring_data#Rsync|rsync]]. Some examples of the most common use of SCP include <br />
{{Command<br />
|scp foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
which will copy the file <tt>foo.txt</tt> from the current directory on my local computer to the directory <tt>$HOME/work</tt> on the cluster [[Béluga/en|Béluga]]. To copy a file, <tt>output.dat</tt> from my project space on the cluster [[Cedar]] to my local computer I can use a command like<br />
{{Command<br />
|scp username@cedar.computecanada.ca:projects/def-jdoe/username/results/output.dat .<br />
}}<br />
Many other examples of the use of SCP are shown [http://www.hypexr.org/linux_scp_help.php here]. Note that you always execute this <tt>scp</tt> command on your local computer, not the remote cluster - the SCP connection, regardless of whether you are transferring data to or from the remote cluster, should always be initiated from your local computer. <br />
<br />
<!--T:29--><br />
SCP supports an option, <code>-r</code>, to recursively transfer a set of directories and files. We '''recommend against using <code>scp -r</code>''' to transfer data into <code>/project</code> because the setgid bit is turned off in the created directories, which may lead to <code>Disk quota exceeded</code> or similar errors if files are later created there (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]).<br />
<br />
<!--T:33--><br />
'''<big>***Note***</big>''' if you chose a custom SSH key name, <i>i.e.</i> something other than the default names: <code>id_dsa</code>, <code>id_ecdsa</code>, <code>id_ed25519</code> and <code>id_rsa</code>, you will need to use the <code>-i</code> option of scp and specify the path to your private key before the file paths via:<br />
<br />
<!--T:34--><br />
{{Command<br />
|scp -i /path/to/key foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
<br />
<!--T:20--><br />
[[Category:Connecting]]<br />
</translate></div>Willis2https://docs.alliancecan.ca/mediawiki/index.php?title=Transferring_data&diff=110729Transferring data2022-01-26T18:26:29Z<p>Willis2: </p>
<hr />
<div><languages /><br />
<br />
<translate><br />
<!--T:30--><br />
Please use ''data transfer nodes'', also called ''data mover nodes'', instead of login nodes whenever you are transferring data to and from Compute Canada clusters. If a data transfer node is available, its URL will be given near the top of the main page for each cluster: [[Béluga/en|Béluga]], [[Cedar]], [[Graham]], [[Niagara]].<br />
<br />
<!--T:31--><br />
[[Globus]] automatically uses data transfer nodes.<br />
<br />
==To and from your personal computer== <!--T:1--><br />
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The commands <code>scp</code> and <code>sftp</code> can be used in a command-line environment on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/en MobaXterm] offers both a graphical file transfer function and a [[Linux introduction|command-line]] interface via [[SSH]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. Setting up a connection to a Compute Canada machine using SSH keys with WinSCP can be done by following the steps in this [https://www.exavault.com/blog/import-ssh-keys-winscp link]. [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY/en PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.<br />
<br />
<!--T:2--><br />
If it takes more than one minute to move your files to or from Compute Canada servers, we recommend you install and try [[Globus#Personal_Computers|Globus Personal Connect]]. [[Globus]] transfers can be set up and will go on in the background without you.<br />
<br />
==Between Compute Canada resources== <!--T:3--><br />
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.<br />
<br />
<!--T:4--><br />
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including<br />
* [[Transferring_data#SFTP | SFTP]]<br />
* [[Transferring_data#SCP | SCP]] or Secure Copy<br />
* [[Transferring_data#Rsync | rsync]]<br />
<br />
==From the World Wide Web== <!--T:5--><br />
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget]. Also available is [https://curl.haxx.se/ curl]. The two are compared in this [https://unix.stackexchange.com/questions/47434/what-is-the-difference-between-curl-and-wget StackExchange article].<br />
<br />
==Synchronizing files== <!--T:6--><br />
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.<br />
<br />
===Globus transfer=== <!--T:7--><br />
We find Globus usually gives the best performance and reliability.<br />
<br />
<!--T:8--><br />
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means all of the files on the source will be transferred. If some of the files may already exist on the destination and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.<br />
<br />
<!--T:9--><br />
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]<br />
<br />
<!--T:10--><br />
You may choose how Globus decides which files to transfer:<br />
{| class="wikitable"<br />
|-<br />
| Their checksums are different || This is the slowest option but most accurate. This will catch changes or errors that result in the same size of file, but with different contents.<br />
|-<br />
| File doesn't exist on destination || This will only transfer files that have been created since the last sync. Useful if you are incrementally creating files.<br />
|-<br />
| File size is different || A quick test. If the file size has changed then its contents must have changed, and it will be re-transferred.<br />
|-<br />
| Modification time is newer || This will check the file's recorded modification time and only transfer the file if it is newer on the source than the destination. If you want to depend on this it is important to check the "preserve source file modification times" option when initiating a Globus transfer.<br />
|}<br />
<br />
<!--T:11--><br />
For more information about Globus please see [[Globus]].<br />
<br />
<br clear="all"/><br />
===Rsync=== <!--T:12--><br />
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running <code>rsync</code> will check the modification time and size of each file, and will only transfer the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the <code>-c</code> option, which will compute checksums at the source and destination, and transfer only if the checksums do not match. <br />
<br />
<!--T:26--><br />
When transferring files into the <code>/project</code> file systems, do not use <code>-p</code> and <code>-g</code> flags (or <code>-a</code>, which implies those two). The quotas in <code>/project</code> are enforced based on group ownership, and thus preserving the group ownership will lead to the [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded]] error message.<br />
<br />
<!--T:32--><br />
If you are using <code>-a</code> when transferring files into the <code>/project</code> file systems, you can add <code>--no-g --no-p</code> to your options, like so<br />
{{Command|rsync -avz --no-g --no-p LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
or avoid using <code>-a</code> altogether<br />
{{Command|rsync -rltv LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
where LOCALNAME can be a folder or file. For large transfers consider adding --partial so interrupted transfers maybe restarted and/or --progress to see a summary of the transfer progress.<br />
<br />
===Using checksums to check if files match=== <!--T:13--><br />
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.<br />
<br />
<!--T:14--><br />
{{Command<br />
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log<br />
}}<br />
<br />
<!--T:15--><br />
This command will create a new file called checksum-result.log in the current directory; the file will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.<br />
<br />
<!--T:16--><br />
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.<br />
<br />
<!--T:17--><br />
{{Command<br />
|diff checksum-result-silo.log checksum-dtn.log<br />
|result=69c69<br />
< 017f14f6a1a194a5f791836d93d14beead0a5115 /home/username/file-0025048576-0000008<br />
---<br />
> 8836913c2cc2272c017d0455f70cf0d698daadb3 /home/username/file-0025048576-0000008<br />
}}<br />
<br />
<!--T:18--><br />
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:<br />
<br />
<!--T:19--><br />
{{Commands<br />
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log<br />
|sort -k2 checksum-dtn.log -o checksum-dtn.log<br />
}}<br />
<br />
==SFTP== <!--T:21--><br />
[https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] (Secure File Transfer Protocol) uses the SSH protocol to transfer files between machines which encrypts data being transferred.<br />
<br />
<!--T:22--><br />
For example you can connect to a remote machine at <code>ADDRESS</code> as user <code>USERNAME</code> with SFTP to transfer files like so:<br />
<br />
<!--T:23--><br />
<source lang="console"><br />
[name@server]$ sftp USERNAME@ADDRESS<br />
The authenticity of host 'ADDRESS (###.###.###.##)' can't be established.<br />
RSA key fingerprint is ##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##.<br />
Are you sure you want to continue connecting (yes/no)? yes<br />
Warning: Permanently added 'ADDRESS,###.###.###.##' (RSA) to the list of known hosts.<br />
USERNAME@ADDRESS's password:<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
or using an [[SSH Keys|SSH Key]] for authentication using the <code>-i</code> option<br />
<source lang="console"><br />
[name@server]$ sftp -i /home/name/.ssh/id_rsa USERNAME@ADDRESS<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
<br />
<!--T:24--><br />
which returns the <code>sftp></code> prompt where commands to transfer files can be issued. To get a list of commands available to use at the sftp prompt enter the <code>help</code> command.<br />
<br />
<!--T:25--><br />
There are also a number of graphical programs available for Windows, Linux and Mac OS, such as [https://winscp.net/eng/index.php WinSCP] and [http://mobaxterm.mobatek.net/ MobaXterm] (Windows), [https://filezilla-project.org filezilla] (Windows,Mac, and Linux), and [https://cyberduck.io/?l=en cyberduck] (Mac and Windows).<br />
[[Category:Connecting]]<br />
<br />
==SCP== <!--T:27--> <br />
<br />
<!--T:28--><br />
SCP stands for [https://en.wikipedia.org/wiki/Secure_copy "Secure Copy"]. Like SFTP it uses the SSH protocol to encrypt data being transferred. It does not support synchronization like [[Globus]] or [[Transferring_data#Rsync|rsync]]. Some examples of the most common use of SCP include <br />
{{Command<br />
|scp foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
which will copy the file <tt>foo.txt</tt> from the current directory on my local computer to the directory <tt>$HOME/work</tt> on the cluster [[Béluga/en|Béluga]]. To copy a file, <tt>output.dat</tt> from my project space on the cluster [[Cedar]] to my local computer I can use a command like<br />
{{Command<br />
|scp username@cedar.computecanada.ca:projects/def-jdoe/username/results/output.dat .<br />
}}<br />
Many other examples of the use of SCP are shown [http://www.hypexr.org/linux_scp_help.php here]. Note that you always execute this <tt>scp</tt> command on your local computer, not the remote cluster - the SCP connection, regardless of whether you are transferring data to or from the remote cluster, should always be initiated from your local computer. <br />
<br />
<!--T:29--><br />
SCP supports an option, <code>-r</code>, to recursively transfer a set of directories and files. We '''recommend against using <code>scp -r</code>''' to transfer data into <code>/project</code> because the setgid bit is turned off in the created directories, which may lead to <code>Disk quota exceeded</code> or similar errors if files are later created there (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]).<br />
<br />
<!--T:33--><br />
'''<big>***Note***</big>''' if you chose a custom SSH key name, <i>i.e.</i> something other than the default names: <code>id_dsa</code>, <code>id_ecdsa</code>, <code>id_ed25519</code> and <code>id_rsa</code>, you will need to use the <code>-i</code> option of scp and specify the path to your private key before the file paths via:<br />
<br />
<!--T:34--><br />
{{Command<br />
|scp -i /path/to/key foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
<br />
<!--T:20--><br />
[[Category:Connecting]]<br />
</translate></div>Willis2https://docs.alliancecan.ca/mediawiki/index.php?title=Using_SSH_keys_in_Linux&diff=110426Using SSH keys in Linux2022-01-20T20:05:14Z<p>Willis2: </p>
<hr />
<div><languages /><br />
<translate><br />
<br />
<!--T:13--><br />
''Parent page: [[SSH]]''<br />
<br />
=Creating a key pair= <!--T:1--><br />
Before creating a new key pair, check to see if you already have one. If you do, but can't remember where you've used it, it's better to create a fresh one, since you shouldn't install a key of unknown security.<br />
<br />
<!--T:14--><br />
Key pairs are typically located in the <code>.ssh/</code> directory in your home directory. By default, a key is named with an "id_" prefix, followed by the key type ("rsa", "dsa", "ed25519"), and the public key also has a ".pub" suffix. So a common example is <code>id_rsa</code> and <code>id_rsa.pub</code>. A good practice is to give it a name that is meaningful to you and identify on which system the key is used.<br />
<br />
<!--T:11--><br />
If you do need a new key, you can generate it with the <code>ssh-keygen</code> command: <br />
<br />
<!--T:2--><br />
<source lang="console"><br />
[name@yourLaptop]$ ssh-keygen -t ed25519<br />
</source><br />
or<br />
<source lang="console"><br />
[name@yourLaptop]$ ssh-keygen -b 4096 -t rsa<br />
</source><br />
(this example explicitly asks for a 4-kbit RSA key, which is a reasonable choice.)<br />
<br />
<!--T:3--><br />
The output will be similar to the following:<br />
<br />
<!--T:4--><br />
<source lang="console"><br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/username/.ssh/id_rsa):<br />
Enter passphrase (empty for no passphrase):<br />
Enter same passphrase again:<br />
Your identification has been saved in /home/username/.ssh/id_rsa.<br />
Your public key has been saved in /home/username/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
ef:87:b5:b1:4d:7e:69:95:3f:62:f5:0d:c0:7b:f1:5e username@hostname<br />
The key's randomart image is:<br />
+--[ RSA 2048]----+<br />
| |<br />
| |<br />
| . |<br />
| o . |<br />
| S o o.|<br />
| . + +oE|<br />
| .o O.oB|<br />
| .. +oo+*|<br />
| ... o..|<br />
+-----------------+<br />
</source><br />
<br />
<!--T:5--><br />
When prompted, enter a passphrase. If you already have key pairs saved with the default names, you should enter a different file name for the new keys to avoid overwriting existing key pairs. <br />
More details on best practices can be found [[SSH_Keys#Best_practices_for_key_pairs| here]]<br />
<br />
=Installing the public part of the key= <!--T:15--><br />
<br />
==Installing via CCDB== <!--T:22--><br />
We encourage all users to leverage the new CCDB feature to install their SSH public key. This will make the key available to all our clusters.<br />
Grab the content of your public key (called id_rsa.pub in the above case) and upload it to CCDB as per step 3 of [[SSH_Keys#Using_CCDB|these instructions]].<br />
<br />
<br />
==Installing locally== <!--T:16--><br />
This method below is still available, but we encourage all users to [[Using_SSH_keys_in_Linux#Installing via CCDB|install it via CCDB]].<br />
If for some reasons you still want to upload the public key locally on a specific cluster, the steps are described below.<br />
<br />
<!--T:23--><br />
The simplest, safest way to install a key to a remote system is using the ssh-copy-id command:<br />
<source lang="console"><br />
ssh-copy-id -i ~/.ssh/mynewkey.pub graham.computecanada.ca<br />
</source><br />
This assumes that the new keypair is named "mynewkey" and "mynewkey.pub", and that your username on the remote machine is the same as your local username.<br />
<br />
<!--T:17--><br />
If necessary, you can do this "manually" - in fact, ssh-copy-id isn't doing anything very magic. It's simply connecting to the remote machine, and placing the public key into <code>.ssh/authorized_keys</code> in your home directory there. The main benefit from using <code>ssh-copy-id</code> is that it will create files and directories if necessary, and will ensure that the permissions on them are correct. You can do it entirely yourself by copying the public key file to the remote server, then:<br />
<source lang="bash"><br />
mkdir ~/.ssh<br />
cat id_rsa.pub >> ~/.ssh/authorized_keys<br />
chmod --recursive go-rwx ~/.ssh<br />
chmod go-w ~<br />
</source><br />
SSH is picky about permissions, on both the client and the server. SSH will fail if the following conditions are not met:<br />
<ul><br />
<li>The private key file must not be accessible to others. <code> chmod go-rwx id_rsa </code><br />
<li>Your remote home directory must not be writable by others <code> chmod go-w ~ </code><br />
<li>Same for your remote ~/.ssh and ~/.ssh/authorized_keys <code> chmod --recursive go-rwx ~/.ssh </code><br />
</ul><br />
Note that debugging the remote conditions may not be obvious without the help of the remote machine's system administrators.<br />
<br />
=Connecting using a key pair= <!--T:6--><br />
<li>Finally test the new key by sshing to the remote machine from the local machine with<br />
<source lang="console">[name@yourLaptop]$ ssh -i /path/to/your/privatekey USERNAME@ADDRESS</source><br />
where<br />
:*<code>/path/to/your/privatekey</code> specifies your private key file, e.g. <code>/home/ubuntu/.ssh/id_rsa</code>;<br />
:*<code>USERNAME</code> is the user name on the remote machine;<br />
:*<code>ADDRESS</code> is the address of the remote machine.<br />
<br />
<!--T:12--><br />
If you have administrative access on the server and created the account for other users, they should test the connection out themselves and not disclose their private key.<br />
</li><br />
</ol><br />
<br />
=Using ssh-agent= <!--T:18--><br />
Having successfully created a key pair and installed the public key on a Compute Canada cluster, you can now login using the key pair. While this is a better solution than using a password to connect to our clusters, it still requires you to type in a passphrase, needed to unlock your private key, every time that you want to login to a cluster. There is however a program, <tt>ssh-agent</tt>, which stores your private key in memory on your local computer and provides it whenever another program on this computer needs it for authentification. This means that you only need to unlock the private key once, after which you can login to a remote cluster many times without having to type in the passphrase again. <br />
<br />
<!--T:19--><br />
You can start the <tt>ssh-agent</tt> program using the command<br />
{{Command|eval `ssh-agent`<br />
}} <br />
After you have started the <tt>ssh-agent</tt>, which will run in the background while you are logged in at your local computer, you can add your key pair to those managed by the agent using the command<br />
{{Command|ssh-add<br />
}}<br />
Assuming you installed your key pair in one of the standard locations, the <tt>ssh-add</tt> command should be able to find it, though if necessary you can explicitly add the full path to the private key as an argument to <tt>ssh-add</tt>. Using the <tt>ssh-add -l</tt> option will show which private keys currently accessible to the <tt>ssh-agent</tt>. <br />
<br />
<!--T:21--><br />
While using <tt>ssh-agent</tt> will allow automatically negotiate the key exchange between your personal computer and the cluster, if you need to use your private key on the cluster itself, for example when interacting with a remote GitHub repository, you will need to enable ''agent forwarding''. To enable this on the [[Béluga/en|Béluga]] cluster, you can add the following lines to your <tt>$HOME/.ssh/config</tt> file on your personal computer,<br />
{{File<br />
|name=config<br />
|lang="text"<br />
|contents=<br />
Host beluga.computecanada.ca<br />
ForwardAgent yes<br />
}}<br />
Note that you should never use the line <tt>Host *</tt> for agent forwarding in your SSH configuration file.<br />
<br />
<!--T:20--><br />
Note that many contemporary Linux distributions as well as macOS now offer graphical "keychain managers" that can easily be configured to also manage your SSH key pair, so that logging in on your local computer is enough to store the private key in memory and have the operating system automatically provide it to the SSH client during login on a remote cluster. You will <br />
then be able to login to Compute Canada clusters without ever typing in any kind of passphrase. <br />
[[Category:Connecting]]<br />
</translate></div>Willis2https://docs.alliancecan.ca/mediawiki/index.php?title=Transferring_data&diff=110425Transferring data2022-01-20T20:00:36Z<p>Willis2: </p>
<hr />
<div><languages /><br />
<br />
<translate><br />
<!--T:30--><br />
Please use ''data transfer nodes'', also called ''data mover nodes'', instead of login nodes whenever you are transferring data to and from Compute Canada clusters. If a data transfer node is available, its URL will be given near the top of the main page for each cluster: [[Béluga/en|Béluga]], [[Cedar]], [[Graham]], [[Niagara]].<br />
<br />
<!--T:31--><br />
[[Globus]] automatically uses data transfer nodes.<br />
<br />
==To and from your personal computer== <!--T:1--><br />
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The commands <code>scp</code> and <code>sftp</code> can be used in a command-line environment on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/en MobaXterm] offers both a graphical file transfer function and a [[Linux introduction|command-line]] interface via [[SSH]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY/en PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.<br />
<br />
<!--T:2--><br />
If it takes more than one minute to move your files to or from Compute Canada servers, we recommend you install and try [[Globus#Personal_Computers|Globus Personal Connect]]. [[Globus]] transfers can be set up and will go on in the background without you.<br />
<br />
==Between Compute Canada resources== <!--T:3--><br />
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.<br />
<br />
<!--T:4--><br />
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including<br />
* [[Transferring_data#SFTP | SFTP]]<br />
* [[Transferring_data#SCP | SCP]] or Secure Copy<br />
* [[Transferring_data#Rsync | rsync]]<br />
<br />
==From the World Wide Web== <!--T:5--><br />
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget]. Also available is [https://curl.haxx.se/ curl]. The two are compared in this [https://unix.stackexchange.com/questions/47434/what-is-the-difference-between-curl-and-wget StackExchange article].<br />
<br />
==Synchronizing files== <!--T:6--><br />
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.<br />
<br />
===Globus transfer=== <!--T:7--><br />
We find Globus usually gives the best performance and reliability.<br />
<br />
<!--T:8--><br />
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means all of the files on the source will be transferred. If some of the files may already exist on the destination and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.<br />
<br />
<!--T:9--><br />
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]<br />
<br />
<!--T:10--><br />
You may choose how Globus decides which files to transfer:<br />
{| class="wikitable"<br />
|-<br />
| Their checksums are different || This is the slowest option but most accurate. This will catch changes or errors that result in the same size of file, but with different contents.<br />
|-<br />
| File doesn't exist on destination || This will only transfer files that have been created since the last sync. Useful if you are incrementally creating files.<br />
|-<br />
| File size is different || A quick test. If the file size has changed then its contents must have changed, and it will be re-transferred.<br />
|-<br />
| Modification time is newer || This will check the file's recorded modification time and only transfer the file if it is newer on the source than the destination. If you want to depend on this it is important to check the "preserve source file modification times" option when initiating a Globus transfer.<br />
|}<br />
<br />
<!--T:11--><br />
For more information about Globus please see [[Globus]].<br />
<br />
<br clear="all"/><br />
===Rsync=== <!--T:12--><br />
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running <code>rsync</code> will check the modification time and size of each file, and will only transfer the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the <code>-c</code> option, which will compute checksums at the source and destination, and transfer only if the checksums do not match. <br />
<br />
<!--T:26--><br />
When transferring files into the <code>/project</code> file systems, do not use <code>-p</code> and <code>-g</code> flags (or <code>-a</code>, which implies those two). The quotas in <code>/project</code> are enforced based on group ownership, and thus preserving the group ownership will lead to the [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded]] error message.<br />
<br />
<!--T:32--><br />
If you are using <code>-a</code> when transferring files into the <code>/project</code> file systems, you can add <code>--no-g --no-p</code> to your options, like so<br />
{{Command|rsync -avz --no-g --no-p LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
or avoid using <code>-a</code> altogether<br />
{{Command|rsync -rltv LOCALNAME someuser@graham.computecanada.ca:projects/def-professor/someuser/}}<br />
where LOCALNAME can be a folder or file. For large transfers consider adding --partial so interrupted transfers maybe restarted and/or --progress to see a summary of the transfer progress.<br />
<br />
===Using checksums to check if files match=== <!--T:13--><br />
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.<br />
<br />
<!--T:14--><br />
{{Command<br />
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log<br />
}}<br />
<br />
<!--T:15--><br />
This command will create a new file called checksum-result.log in the current directory; the file will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.<br />
<br />
<!--T:16--><br />
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.<br />
<br />
<!--T:17--><br />
{{Command<br />
|diff checksum-result-silo.log checksum-dtn.log<br />
|result=69c69<br />
< 017f14f6a1a194a5f791836d93d14beead0a5115 /home/username/file-0025048576-0000008<br />
---<br />
> 8836913c2cc2272c017d0455f70cf0d698daadb3 /home/username/file-0025048576-0000008<br />
}}<br />
<br />
<!--T:18--><br />
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:<br />
<br />
<!--T:19--><br />
{{Commands<br />
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log<br />
|sort -k2 checksum-dtn.log -o checksum-dtn.log<br />
}}<br />
<br />
==SFTP== <!--T:21--><br />
[https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] (Secure File Transfer Protocol) uses the SSH protocol to transfer files between machines which encrypts data being transferred.<br />
<br />
<!--T:22--><br />
For example you can connect to a remote machine at <code>ADDRESS</code> as user <code>USERNAME</code> with SFTP to transfer files like so:<br />
<br />
<!--T:23--><br />
<source lang="console"><br />
[name@server]$ sftp USERNAME@ADDRESS<br />
The authenticity of host 'ADDRESS (###.###.###.##)' can't be established.<br />
RSA key fingerprint is ##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##.<br />
Are you sure you want to continue connecting (yes/no)? yes<br />
Warning: Permanently added 'ADDRESS,###.###.###.##' (RSA) to the list of known hosts.<br />
USERNAME@ADDRESS's password:<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
or using an [[SSH Keys|SSH Key]] for authentication using the <code>-i</code> option<br />
<source lang="console"><br />
[name@server]$ sftp -i /home/name/.ssh/id_rsa USERNAME@ADDRESS<br />
Connected to ADDRESS.<br />
sftp><br />
</source><br />
<br />
<!--T:24--><br />
which returns the <code>sftp></code> prompt where commands to transfer files can be issued. To get a list of commands available to use at the sftp prompt enter the <code>help</code> command.<br />
<br />
<!--T:25--><br />
There are also a number of graphical programs available for Windows, Linux and Mac OS, such as [https://winscp.net/eng/index.php WinSCP] and [http://mobaxterm.mobatek.net/ MobaXterm] (Windows), [https://filezilla-project.org filezilla] (Windows,Mac, and Linux), and [https://cyberduck.io/?l=en cyberduck] (Mac and Windows).<br />
[[Category:Connecting]]<br />
<br />
==SCP== <!--T:27--> <br />
<br />
<!--T:28--><br />
SCP stands for [https://en.wikipedia.org/wiki/Secure_copy "Secure Copy"]. Like SFTP it uses the SSH protocol to encrypt data being transferred. It does not support synchronization like [[Globus]] or [[Transferring_data#Rsync|rsync]]. Some examples of the most common use of SCP include <br />
{{Command<br />
|scp foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
which will copy the file <tt>foo.txt</tt> from the current directory on my local computer to the directory <tt>$HOME/work</tt> on the cluster [[Béluga/en|Béluga]]. To copy a file, <tt>output.dat</tt> from my project space on the cluster [[Cedar]] to my local computer I can use a command like<br />
{{Command<br />
|scp username@cedar.computecanada.ca:projects/def-jdoe/username/results/output.dat .<br />
}}<br />
Many other examples of the use of SCP are shown [http://www.hypexr.org/linux_scp_help.php here]. Note that you always execute this <tt>scp</tt> command on your local computer, not the remote cluster - the SCP connection, regardless of whether you are transferring data to or from the remote cluster, should always be initiated from your local computer. <br />
<br />
<!--T:29--><br />
SCP supports an option, <code>-r</code>, to recursively transfer a set of directories and files. We '''recommend against using <code>scp -r</code>''' to transfer data into <code>/project</code> because the setgid bit is turned off in the created directories, which may lead to <code>Disk quota exceeded</code> or similar errors if files are later created there (see [[Frequently_Asked_Questions#Disk_quota_exceeded_error_on_.2Fproject_filesystems | Disk quota exceeded error on /project filesystems]]).<br />
<br />
'''<big>***Note***</big>''' if you chose a custom SSH key name, <i>i.e.</i> something other than the default names: <code>id_dsa</code>, <code>id_ecdsa</code>, <code>id_ed25519</code> and <code>id_rsa</code>, you will need to use the <code>-i</code> option of scp and specify the path to your private key before the file paths via:<br />
<br />
{{Command<br />
|scp -i /path/to/key foo.txt username@beluga.computecanada.ca:work/<br />
}}<br />
<br />
<!--T:20--><br />
[[Category:Connecting]]<br />
</translate></div>Willis2