General directives for migration: Difference between revisions
(→What to do before the migration starts?: subheaders, simplify, remove redundancy) |
(simplify, shorten & copy-edit) |
||
Line 1: | Line 1: | ||
{{Draft}} | {{Draft}} | ||
This page is | This page is for users of Compute Canada clusters concerned about data migration. It explains issues related to transferring your data between Compute Canada facilities and its regional partners ([http://www.ace-net.ca/ ACENET], [http://www.calculquebec.ca/en/ Calcul Quebec], [http://computeontario.ca/ Compute Ontario] and [https://www.westgrid.ca/ WestGrid]). | ||
If you are in any doubt about details of the following advice, contact [mailto:support@computecanada.ca support@computecanada.ca] for help. | |||
== What to do before the migration starts? == | == What to do before the migration starts? == | ||
Make sure you know whether you are responsible for your own data migration, or whether Compute Canada staff will be migrating your data. Migration of certain legacy systems like [[Migration2016:Silo|Silo]] is being handled by staff. If you are in any doubt, write [mailto:support@computecanada.ca support@computecanada.ca]. | |||
Test any tools you will use (like [[tar]], [[gzip]], [[zip | If you haven't used [[Globus]] before, read about it now and verify that it works on the system you are migrating from. Test any other tools you will use (like [[tar]], [[gzip]], [[zip]]) on test data to ensure you know how they work before using them on important data. | ||
Do not wait until the last minute to start your migration. Depending on how much data you have and how much load there is on the machines and network, you may be surprised at how long it will take to finish a large transfer. Expect hundreds of gigabytes to take hours to transfer, but give yourself days in case there is a problem. Expect terabytes to take days. | |||
=== Clean up === | === Clean up === | ||
Line 15: | Line 20: | ||
Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use [[tar]] or [[zip]] to combine (archive) and compress them. | Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use [[tar]] or [[zip]] to combine (archive) and compress them. | ||
Large files can also benefit from compression in many cases, especially text files or numeric data stored as human-readable text. You can use again use [[tar]] for this, or [[gzip]], or [[zip]]. | Large files can also benefit from compression in many cases, especially text files or numeric data stored as human-readable text. You can use again use [[tar]] for this, or [[gzip]], or [[zip]]. | ||
=== Avoid duplication === | === Avoid duplication === | ||
Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the | Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the sources, choose one and only move the duplicate data from that one. | ||
Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name. | Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name. | ||
== What to do during the migration process? == | == What to do during the migration process? == | ||
If it is supported at your source site, use [[Globus|Globus Online]] to set up your file transfer. It is the most user-friendly and efficient tool we know of for this task. Globus is designed to recover from network interruptions automatically. We recommend you select the following options at the bottom of the "Transfer files" screen: | |||
* preserve source file modification times | |||
* verify file integrity after transfer | |||
If Globus is not supported at your source site, then the advice to compress data and avoid duplication is even more important. If you must use one of [[scp]], [[sftp]], or [[rsync]], then: | |||
* Make a schedule to migrate your data part by part. If the transfer stops for any reason you will be able to try again starting from the incomplete file, but you will not have to re-transfer files that are already complete. An organized list of files will help here. | |||
* Check regularly to see that the transfer process has not stopped. File size is a good indicator of progress. If no files have changed size for several minutes, then something may have gone wrong. If restarting the transfer does not work, contact [mailto:support@computecanada.ca support@computecanada.ca]. | |||
* Make a schedule to migrate your data part by part | |||
* | |||
Be patient. Even with Globus, transferring large volumes of data can be time consuming. Specific transfer speeds will vary a lot, but expect hundreds of gigabytes to take hours and terabytes to take days. | |||
== What to do after migration? == | |||
If you did not use Globus, or if you did but did not check "verify file integrity", make sure that the data you have transferred are not corrupted. A crude way to do this is to compare file sizes at the source with file sizes at the destination. For greater confidence you can use [http://man7.org/linux/man-pages/man1/cksum.1.html cksum] or [http://man7.org/linux/man-pages/man1/md5sum.1.html md5sum] at each end, and see that the results match. Any files with mismatching sizes or checksums should be transferred again. | |||
== Where and how to get HELP? == | == Where and how to get HELP? == | ||
* To know how to use different archiving and compression utilities, use the Linux command like <code>man <command></code> or <code><command> --help</code>. | |||
* To know how to use different archiving and compression utilities, use the Linux command like man <command> or <command> --help | * Email [mailto:support@computecanada.ca support@computecanada.ca] | ||
* | |||
Revision as of 21:14, 25 November 2016
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
This page is for users of Compute Canada clusters concerned about data migration. It explains issues related to transferring your data between Compute Canada facilities and its regional partners (ACENET, Calcul Quebec, Compute Ontario and WestGrid).
If you are in any doubt about details of the following advice, contact support@computecanada.ca for help.
What to do before the migration starts?[edit]
Make sure you know whether you are responsible for your own data migration, or whether Compute Canada staff will be migrating your data. Migration of certain legacy systems like Silo is being handled by staff. If you are in any doubt, write support@computecanada.ca.
If you haven't used Globus before, read about it now and verify that it works on the system you are migrating from. Test any other tools you will use (like tar, gzip, zip) on test data to ensure you know how they work before using them on important data.
Do not wait until the last minute to start your migration. Depending on how much data you have and how much load there is on the machines and network, you may be surprised at how long it will take to finish a large transfer. Expect hundreds of gigabytes to take hours to transfer, but give yourself days in case there is a problem. Expect terabytes to take days.
Clean up[edit]
It is a good practice to look at your files regularly and see what can be deleted, but unfortunately many of us do not have the habit. A major data migration is a good reminder to clean up your files and directories. Moving less data will take less time, and storage space even on new systems is in great demand and should not be wasted.
- If you compile programs and keep source code, delete any intermediate files. One or more of
make clean
,make realclean
, orrm *.o
might be appropriate, depending on your makefile. - If you find any large files named like
core.12345
and you don't know that they are, they are probably core dumps and can be deleted.
Compress and archive[edit]
Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use tar or zip to combine (archive) and compress them.
Large files can also benefit from compression in many cases, especially text files or numeric data stored as human-readable text. You can use again use tar for this, or gzip, or zip.
Avoid duplication[edit]
Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the sources, choose one and only move the duplicate data from that one.
Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name.
What to do during the migration process?[edit]
If it is supported at your source site, use Globus Online to set up your file transfer. It is the most user-friendly and efficient tool we know of for this task. Globus is designed to recover from network interruptions automatically. We recommend you select the following options at the bottom of the "Transfer files" screen:
- preserve source file modification times
- verify file integrity after transfer
If Globus is not supported at your source site, then the advice to compress data and avoid duplication is even more important. If you must use one of scp, sftp, or rsync, then:
- Make a schedule to migrate your data part by part. If the transfer stops for any reason you will be able to try again starting from the incomplete file, but you will not have to re-transfer files that are already complete. An organized list of files will help here.
- Check regularly to see that the transfer process has not stopped. File size is a good indicator of progress. If no files have changed size for several minutes, then something may have gone wrong. If restarting the transfer does not work, contact support@computecanada.ca.
Be patient. Even with Globus, transferring large volumes of data can be time consuming. Specific transfer speeds will vary a lot, but expect hundreds of gigabytes to take hours and terabytes to take days.
What to do after migration?[edit]
If you did not use Globus, or if you did but did not check "verify file integrity", make sure that the data you have transferred are not corrupted. A crude way to do this is to compare file sizes at the source with file sizes at the destination. For greater confidence you can use cksum or md5sum at each end, and see that the results match. Any files with mismatching sizes or checksums should be transferred again.
Where and how to get HELP?[edit]
- To know how to use different archiving and compression utilities, use the Linux command like
man <command>
or<command> --help
. - Email support@computecanada.ca