Transferring data: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 17: Line 17:
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget].
The standard tool for downloading data from websites is [https://en.wikipedia.org/wiki/Wget wget].


==Synchronize or verify files after transfer==
==Synchronizing files==
Two synchronize or "sync" two files (or two directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.
Two synchronize or "sync" files (or directories) stored in two different locations means to ensure that the two copies are the same. Here are several different ways to do this.


===Globus Transfer===
===Globus Transfer===
We find Globus Transfer usually gives the greatest performance and reliability.
We find Globus usually gives the best performance and reliability.


Normally when a Globus Transfer is initiated it will overwrite the files on the destination with the files from the source, which means  all of the files on the source will be transferred. If some of the files may already exist on the destination, and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.
Normally when a Globus transfer is initiated it will overwrite the files on the destination with the files from the source, which means  all of the files on the source will be transferred. If some of the files may already exist on the destination, and need not be transferred if they match, you should go to the bottom of the transfer window as shown in the screenshot and choose to "sync" instead.


[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]
[[File:Globus_Transfer_Sync_Options.png|280px|thumb|left]]
Line 42: Line 42:


===Rsync===
===Rsync===
Rsync is a popular tool for ensuring that two separate datasets are the same but can be quite slow if your dataset has a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running rsync will check the modification time and size of each file before transmitting it. If for some reason your modification times do not match on the two systems you can also run using the "-c" option which will create a checksum of the source and destination file before transferring. Generating checksums for files can slow down.
[https://en.wikipedia.org/wiki/Rsync Rsync] is a popular tool for ensuring that two separate datasets are the same but can be quite slow there are a lot of files or there is a lot of latency between the two sites, i.e. they are geographically apart or on different networks. Running rsync will check the modification time and size of each file, and will only transmit the file if one or the other does not match. If you expect modification times not to match on the two systems you can use the "-c" option which will compute checksums at the source and destination, and transfer only if the checksums do not match.


===Using checksums to check if files match===
===Using checksums to check if files match===
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu