Transferring data: Difference between revisions

Jump to navigation Jump to search
Marked this version for translation
(Marked this version for translation)
Line 2: Line 2:


<translate>
<translate>
==To and from your personal computer==
==To and from your personal computer== <!--T:1-->
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The command line programs <code>scp</code> and <code>sftp</code> can be used from within terminal programs on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [http://mobaxterm.mobatek.net/MobaXterm MobaXterm] offers both file transfer and a [[SSH|terminal function]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.
You will need software that supports secure transfer of files between your computer and the Compute Canada machines. The command line programs <code>scp</code> and <code>sftp</code> can be used from within terminal programs on '''Linux''' or '''Mac''' OS X computers. On '''Microsoft Windows''' platforms, [http://mobaxterm.mobatek.net/MobaXterm MobaXterm] offers both file transfer and a [[SSH|terminal function]], while [http://winscp.net/eng/index.php WinSCP] is another free program that supports file transfer. [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] comes with <code>pscp</code> and <code>psftp</code> which are essentially the same as the Linux and Mac command line programs.


<!--T:2-->
If it takes more than about a minute to move your files to or from Compute Canada servers, you should install [[Globus#Personal_Computers|Globus Personal Connect]] and try it. [[Globus]] transfers can be set up and will go on in the background without you. Most (but not all) Compute Canada legacy systems can be reached with Globus.
If it takes more than about a minute to move your files to or from Compute Canada servers, you should install [[Globus#Personal_Computers|Globus Personal Connect]] and try it. [[Globus]] transfers can be set up and will go on in the background without you. Most (but not all) Compute Canada legacy systems can be reached with Globus.


==Between Compute Canada resources==
==Between Compute Canada resources== <!--T:3-->
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.
[[Globus]] is the preferred tool for transferring data between Compute Canada systems, and if it can be used, it should.


<!--T:4-->
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including
However, other common tools can also be found for transferring data both inside and outside of Compute Canada, including
* Secure copy [https://en.wikipedia.org/wiki/Secure_copy scp] (examples [http://www.hypexr.org/linux_scp_help.php here])
* Secure copy [https://en.wikipedia.org/wiki/Secure_copy scp] (examples [http://www.hypexr.org/linux_scp_help.php here])
Line 15: Line 17:
* [https://en.wikipedia.org/wiki/Rsync rsync]
* [https://en.wikipedia.org/wiki/Rsync rsync]


==From the World Wide Web==
==From the World Wide Web== <!--T:5-->
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget].
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget].


==Synchronizing files==
==Synchronizing files== <!--T:6-->
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.
To synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.


===Globus Transfer===
===Globus Transfer=== <!--T:7-->
We find Globus usually gives the best performance and reliability.
We find Globus usually gives the best performance and reliability.


<!--T:8-->
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means  all of the files on the source will be transferred. If some of the files may already exist on the destination, and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means  all of the files on the source will be transferred. If some of the files may already exist on the destination, and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.


<!--T:9-->
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]


<!--T:10-->
You may choose how Globus decides which files to transfer:
You may choose how Globus decides which files to transfer:
{| class="wikitable"
{| class="wikitable"
Line 40: Line 45:
|}
|}


<!--T:11-->
For more information about Globus please see [[Globus]].
For more information about Globus please see [[Globus]].


===Rsync===
===Rsync=== <!--T:12-->
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running rsync will check the modification time and size of each file, and will only transmit the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the "-c" option which will compute checksums at the source and destination, and transfer only if the checksums do not match.
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow if there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running rsync will check the modification time and size of each file, and will only transmit the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the "-c" option which will compute checksums at the source and destination, and transfer only if the checksums do not match.


===Using checksums to check if files match===
===Using checksums to check if files match=== <!--T:13-->
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a  [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.
If Globus is unavailable between the two systems being synchronized and Rsync is taking too long, then you can use a  [https://en.wikipedia.org/wiki/Checksum checksum] utility on both systems to determine if the files match. In this example we use <code>sha1sum</code>.


<!--T:14-->
{{Command
{{Command
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log
|find /home/username/ -type f -print0 {{!}} xargs -0 sha1sum {{!}} tee checksum-result.log
}}
}}


<!--T:15-->
This command will create a new file called checksum-result.log in the current directory that will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.
This command will create a new file called checksum-result.log in the current directory that will contain all of the checksums for the files in /home/username/. It will also print out all of the checksums to the screen as it goes. If you have a lot of files or very large files you may want to run this command in the background, in a [https://en.wikipedia.org/wiki/GNU_Screen screen] or [https://en.wikipedia.org/wiki/Tmux tmux] session; anything that allows it to continue if your [[SSH]] connection times out.


<!--T:16-->
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.
After you run it on both systems you can use the <code>diff</code> utility to find files that don't match.


<!--T:17-->
{{Command
{{Command
|diff checksum-result-silo.log checksum-dtn.log
|diff checksum-result-silo.log checksum-dtn.log
Line 64: Line 74:
}}
}}


<!--T:18-->
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:
It is possible that the <code>find</code> command will crawl through the directories in a different order resulting in a lot of false differences so you may need to run <code>sort</code> on both files before running diff such as:


<!--T:19-->
{{Commands
{{Commands
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log
|sort -k2 checksum-result-silo.log -o checksum-result-silo.log
Line 71: Line 83:
}}
}}


<!--T:20-->
[[Category:Connecting]]
[[Category:Connecting]]
</translate>
</translate>
rsnt_translations
58,180

edits

Navigation menu