Tar: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
Line 10: Line 10:
When you archive a directory with <code>tar</code>, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So
When you archive a directory with <code>tar</code>, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So
  tar --create --file project1.tar project1
  tar --create --file project1.tar project1
will pack all the contents of directory <code>project1/</code> into the file <code>project1.tar</code>. The original directory will be unchanged, so this may double the amount of disk space occupied.
will pack all the contents of directory <code>project1/</code> into the file <code>project1.tar</code>. The original directory will be unchanged, so this may double the amount of disk space occupied!


The same command with different options is used to extract files from the archive on a new system, ''e.g.:''
You can extract files from the archive using the same command with a different option:
  tar --extract --file project1.tar
  tar --extract --file project1.tar
If there's no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as the original directory, they will be overwritten.
If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten.


=== How to compress and uncompress tar files ===
=== How to compress and uncompress tar files ===
Line 24: Line 24:
Typically, <code>--xz</code> will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [http://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO]. <code>--gzip</code> does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during <code>tar --create</code>.
Typically, <code>--xz</code> will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [http://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO]. <code>--gzip</code> does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during <code>tar --create</code>.


You can also run <code>tar --create</code> first without compression and then use the commands <code>xz</code> or <code>gzip</code> in a separate step, although there is rarely a reason to do so. Similarly you can run <code>xz -d</code> or <code>gzip -d</code> to decompress an archive file before running <code>tar --extract</code>, but again there is rarely a reason to do so.
You can also run <code>tar --create</code> first without compression and then use the commands <code>xz</code> or <code>gzip</code> in a separate step, although there is rarely a reason to do so. Similarly you can run <code>xz -d</code> or <code>gzip -d</code> to decompress an archive file before running <code>tar --extract</code>, but again there is rarely a reason to do so.  


These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: <code>man <command></code>.
These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: <code>man <command></code>.
Line 30: Line 30:
=== Summary of common tar options ===
=== Summary of common tar options ===
These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:
These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:
* <code>-c</code> or <code>--create</code>: {Create a new archive.}
* <code>-c</code> or <code>--create</code>: Create a new archive.
* <code>-f</code> or <code>--file=</code>: {Following is the archive file name.}
* <code>-f</code> or <code>--file=</code>: Following is the archive file name.
* <code>-x</code> or <code>--extract</code>: {Extract files from archive.}
* <code>-x</code> or <code>--extract</code>: Extract files from archive.
* <code>-t</code> or <code>--list</code>: {List the contents of an archive file.}
* <code>-t</code> or <code>--list</code>: List the contents of an archive file.
* <code>-J</code> or <code>--xz</code>: {Compress or uncompress with <code>xz</code>.}
* <code>-J</code> or <code>--xz</code>: Compress or uncompress with <code>xz</code>.
* <code>-z</code> or <code>--gzip</code>: {Compress or uncompress with <code>gzip</code>.}
* <code>-z</code> or <code>--gzip</code>: Compress or uncompress with <code>gzip</code>.
There are many more options and various versions of <code>tar</code> about. You can get a complete list of the options available on your system with <code>man tar</code> or <code>tar --help</code>. Note in particular that some older systems might not support <code>--xz</code> compression.
There are many more options for <code>tar</code>, and the precise options and their syntax may depend on the version you are using. You can get a complete list of the options available on your system with <code>man tar</code> or <code>tar --help</code>. Note in particular that some older systems might not support <code>--xz</code> compression.
 
<!--- Maybe there should be a section of examples here, but not specifically about migration.
Migration-specific advice should move to "General directives for migration"


== Common and useful commands to use to prepare your archives: ==
== Common and useful commands to use to prepare your archives: ==
To illustrate the different commands and how to use archive utilities, we use a given directory that looks like a home directory or any other directory that contains files, sub-directories ... etc. Let us suppose that you have already cleaned and removed the data you do not need and your data is ready for migration. Before that, there is one more step which is to compress your data. In the following, you will find the most common use of archiving and compressing utilities with adequate options. As an example, we use one directory called here '''Migration''' (or whatever is the name of your directory) and see how we can apply the different archiving and compressing utilities.  
To illustrate the different commands and how to use archive utilities, we use a given directory that looks like a home directory or any other directory that contains files, sub-directories ... etc. Let us suppose that you have already cleaned and removed the data you do not need and your data is ready for migration. Before that, there is one more step which is to compress your data. In the following, you will find the most common use of archiving and compressing utilities with adequate options. As an example, we use one directory called here '''Migration''' (or whatever is the name of your directory) and see how we can apply the different archiving and compressing utilities.  
On your terminal, change the directory to '''Migration''' (or the directory you want to work with) then: <br>
On your terminal, change the directory to '''Migration''' (or the directory you want to work with) then: <br>
Line 79: Line 81:
* The tar command can be applied to multiple files or directories in order to put them together into a final one file archive.
* The tar command can be applied to multiple files or directories in order to put them together into a final one file archive.
* The gzip and bzip2 are applied to a single file or a single archive file but not a directory.
* The gzip and bzip2 are applied to a single file or a single archive file but not a directory.
--->

Revision as of 20:24, 30 November 2016


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




Archiving means creating one file that contains a number of smaller files within it. Archiving data can improve the efficiency of file transfers. It is faster for the secure copy protocol (scp), for example, to transfer one archive file of a reasonable size than thousands of small files of equal total size. Therefore we recommend you transfer an archive rather than transferring a directory with all its files and sub-directories individually. In this page, we show by example how to prepare archive files for efficient file transfer.

Compressing means encoding a file such that the same information is contained in fewer bytes of storage. The speed of a large-scale data transfer is dominated by the number of bytes that must be moved, so if the data can be compressed a significant amount, the transfer will be quicker.

Use tar to archive files and directories

The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar as the file name extension.

When you archive a directory with tar, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So

tar --create --file project1.tar project1

will pack all the contents of directory project1/ into the file project1.tar. The original directory will be unchanged, so this may double the amount of disk space occupied!

You can extract files from the archive using the same command with a different option:

tar --extract --file project1.tar

If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten.

How to compress and uncompress tar files

tar can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz or gzip, which can be used like so:

tar --create --xz --file project1.tar.xz project1
tar --extract --xz --file project1.tar.xz
tar --create --gzip --file project1.tar.gz project1
tar --extract --gzip --file project1.tar.gz

Typically, --xz will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [1]. --gzip does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create.

You can also run tar --create first without compression and then use the commands xz or gzip in a separate step, although there is rarely a reason to do so. Similarly you can run xz -d or gzip -d to decompress an archive file before running tar --extract, but again there is rarely a reason to do so.

These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: man <command>.

Summary of common tar options

These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:

  • -c or --create: Create a new archive.
  • -f or --file=: Following is the archive file name.
  • -x or --extract: Extract files from archive.
  • -t or --list: List the contents of an archive file.
  • -J or --xz: Compress or uncompress with xz.
  • -z or --gzip: Compress or uncompress with gzip.

There are many more options for tar, and the precise options and their syntax may depend on the version you are using. You can get a complete list of the options available on your system with man tar or tar --help. Note in particular that some older systems might not support --xz compression.