Tar: Difference between revisions
No edit summary |
No edit summary |
||
Line 17: | Line 17: | ||
=== How to compress and uncompress tar files === | === How to compress and uncompress tar files === | ||
<code>tar</code> can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend | <code>tar</code> can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either '''<code>xz</code>''' or '''<code>gzip</code>''', which can be used like so: | ||
tar --create --xz --file project1.tar.xz project1 | |||
tar --extract --xz --file project1.tar.xz | |||
tar --create --gzip --file project1.tar.gz project1 | |||
tar --extract --gzip --file project1.tar.gz | |||
Typically, <code>--xz</code> will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working. <code>--gzip</code> does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during <code>tar --create</code>. | |||
You can also run <code>tar --create</code> first without compression and then use the commands <code>xz</code> or <code>gzip</code> in a separate step, although there is rarely a reason to do so. Similarly you can run <code>xz -d</code> or <code>gzip -d</code> to decompress an archive file before running <code>tar --extract</code>, but again there is rarely a reason to do so. | |||
Never try to compress a file that is already compressed, not even with a different tool. | |||
These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: <code>man <command></code>. | These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: <code>man <command></code>. |
Revision as of 19:37, 30 November 2016
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Archiving means creating one file that contains a number of smaller files within it. Archiving data can improve the efficiency of file transfers. It is faster for the secure copy protocol (scp), for example, to transfer one archive file of a reasonable size than thousands of small files of equal total size. Therefore we recommend you transfer an archive rather than transferring a directory with all its files and sub-directories individually. In this page, we show by example how to prepare archive files for efficient file transfer.
Compressing means encoding a file such that the same information is contained in fewer bytes of storage. The speed of a large-scale data transfer is dominated by the number of bytes that must be moved, so if the data can be compressed a significant amount, the transfer will be quicker.
Use tar to archive files and directories
The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar
as the file name extension.
When you archive a directory with tar
, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So
tar --create --file project1.tar project1
will pack all the contents of directory project1/
into the file project1.tar
. The original directory will be unchanged, so this may double the amount of disk space occupied.
The same command with different options is used to extract files from the archive on a new system, e.g.:
tar --extract --file project1.tar
If there's no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as the original directory, they will be overwritten.
How to compress and uncompress tar files
tar
can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz
or gzip
, which can be used like so:
tar --create --xz --file project1.tar.xz project1 tar --extract --xz --file project1.tar.xz tar --create --gzip --file project1.tar.gz project1 tar --extract --gzip --file project1.tar.gz
Typically, --xz
will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working. --gzip
does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create
.
You can also run tar --create
first without compression and then use the commands xz
or gzip
in a separate step, although there is rarely a reason to do so. Similarly you can run xz -d
or gzip -d
to decompress an archive file before running tar --extract
, but again there is rarely a reason to do so.
Never try to compress a file that is already compressed, not even with a different tool.
These archiving utilities are invoked with some options and arguments. For more details on how to use these utilities, you can type on your terminal: man <command>
.
The general syntax for tar
, gzip
, gunzip
, bzip2
and bunzip2
is as follow:
tar [option(s)] [your_file.tar or your_archive_name.tar] [filename(s), directory or directories]
gzip [your_file or your_archive_name.tar]
gunzip [your_file.gz or your_archive_name.tar.gz]
bzip [your_file or your_archive_name.tar]
bunzip2 [your_file.bz2 or your_archive_name.tar.bz2]
Let us mention that:
gunzip
is only used to uncompress files with gz extension.bunzip2
is only used to uncompress files with bz2 extension.
These are the most common options for tar command:
-c
: {option is used to create a new archive.}-v
: {verbosely list files which are processed.}-f
: {following is the archive file name.}-t
: {list the content of an archive file.}-r
: {to add files an existing archive.}-A
: {to append an archive at the end on another.}-x
: {extract files from archive.}-z
: {filter the archive throughgzip
.}-C
: {directory file: performs a chdir [change directory] operation on directory and performs thec
(create) orr
(replace) operation on file.}
Common and useful commands to use to prepare your archives:
To illustrate the different commands and how to use archive utilities, we use a given directory that looks like a home directory or any other directory that contains files, sub-directories ... etc. Let us suppose that you have already cleaned and removed the data you do not need and your data is ready for migration. Before that, there is one more step which is to compress your data. In the following, you will find the most common use of archiving and compressing utilities with adequate options. As an example, we use one directory called here Migration (or whatever is the name of your directory) and see how we can apply the different archiving and compressing utilities.
On your terminal, change the directory to Migration (or the directory you want to work with) then:
- Use pwd {present work directory} to see the current working path.
- Use ls {list} command to see the files and the sub-directories in the current working path.
- Use du -sh {disk usage} to see the size of the files, directories and sub-directories. This information will help you to see how to prepare your archives and which files to put together or to compress separately.
As shown in this example:
[user_name@localhost]$ pwd
/global/scratch/user_name/Migration
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ tests/ work/
[user_name@localhost]$ du -sh *
3,0K bin
876K documents
136K jobs
12K new.log.dat
68K programs
1,8M report
120K results
48K tests
46K work
This example shows that we are currently working on the directory called Migration and it contains the following files and directories (bin, documents, jobs, new.log.dat, programs, report, results, tests, work). The size of each file or directory is given by the use of the command du -sh
. We will explain later why it is important to use this command to determine the size of your files or directories before starting compression.
In this example, we have used few directories and small files. In your case, you may have more directories, more files and large files. But the idea is the same and it consists on creating your archives using the tar
, gzip
, bzip2
from your terminal. You can recover them later by tar
{with specific options}, gunzip
and bunzip2
. We will explain how these utilities work by giving the most common and used commands and options.
Notes:
- Before starting compression, make sure you are not running out of space or quota because the tar command uses the free space to create the archive. At the end, it is like you have added data with the same size as the file or the directory you are trying to tar. When using tar, the original file stays without any change unless you make changes later or remove it.
- For gzip and bzip2, they also use some free space to create the final archive but in this case the new file you will get is your_file.gz if you use gzip or your_file.bz2 if you use bzip2; if it is a tar file; you will get the new file your_archive.tar.gz or your_archive.tar.bz2
- The tar command can be applied to multiple files or directories in order to put them together into a final one file archive.
- The gzip and bzip2 are applied to a single file or a single archive file but not a directory.