Tar

From Alliance Doc
Revision as of 14:13, 1 December 2016 by Kerrache (talk | contribs)
Jump to navigation Jump to search

This page is not finished yet

Other languages:

Archiving means creating one file that contains a number of smaller files within it. Archiving data can improve the efficiency of file storage, and of file transfers. It is faster for the secure copy protocol (scp), for example, to transfer one archive file of a reasonable size than thousands of small files of equal total size.

Compressing means encoding a file such that the same information is contained in fewer bytes of storage. The advantage for long-term data storage should be obvious. For data transfers, the time spent compressing the data can be balanced against the time saved moving fewer bytes as described in this discussion of data compression and transfer from the US National Center for Supercomputing Applications.

Use tar to archive files and directories

The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar as the file name extension.

When you archive a directory with tar, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So

Question.png
[name@server ~]$ tar --create --file project1.tar project1

will pack all the contents of directory project1/ into the file project1.tar. The original directory will be unchanged, so this may double the amount of disk space occupied!

You can extract files from the archive using the same command with a different option:

Question.png
[name@server ~]$ tar --extract --file project1.tar

If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten.

How to compress and uncompress tar files

tar can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz or gzip, which can be used like so:

[name@server ~]$ tar --create --xz --file project1.tar.xz project1
[name@server ~]$ tar --extract --xz --file project1.tar.xz
[name@server ~]$ tar --create --gzip --file project1.tar.gz project1
[name@server ~]$ tar --extract --gzip --file project1.tar.gz

Typically, --xz will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [1]. --gzip does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create.

You can also run tar --create first without compression and then use the commands xz or gzip in a separate step, although there is rarely a reason to do so. Similarly you can run xz -d or gzip -d to decompress an archive file before running tar --extract, but again there is rarely a reason to do so.

Common tar options

These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:

  • -c or --create: Create a new archive.
  • -f or --file=: Following is the archive file name.
  • -x or --extract: Extract files from archive.
  • -t or --list: List the contents of an archive file.
  • -J or --xz: Compress or uncompress with xz.
  • -z or --gzip: Compress or uncompress with gzip.

Single-letter options can be combined with a single dash, so for example

Question.png
[name@server ~]$ tar -cJf project1.tar.zx project1

is equivalent to

Question.png
[name@server ~]$ tar --create --xz --file=project1.tar.xz project1

There are many more options for tar, and may depend on the version you are using. You can get a complete list of the options available on your system with man tar or tar --help. Note in particular that some older systems might not support --xz compression.


How to tar a given directory?

Now, we can go back to our test example and try to create an archive called results.tar for the directory results; on your terminal type:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/

Then:

[user_name@localhost]$ tar -cvf results.tar results
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4

Using ls command we can see the tar file created:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar  tests/  work/

In this example, we have invoked the tar command with the options c {for create}, v {for verbosity} and f {for file}. As a name for the archive, we have used results.tar; this name can be something else but it is better to keep similar name as the file or directory we want to tar. It is easier to recognize your data later without having to uncompress them to see what data you have in this file.

If we want to add more directories to a tar file; for example, an archive file called full_results.tar that for the directories results, reports and documents, we can proceed as follow:

[user_name@localhost]$ tar -cvf full_results.tar results report documents/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
report/
report/report-2016.pdf
report/report-a.pdf
documents/
documents/1504.pdf
documents/ff.doc
[user_name@localhost]$ ls
bin/  documents/  full_results.tar  jobs/  new.log.dat  programs/  report/  results/  results.tar  tests/  work/

How to tar for example all the files or directories that start with a given a letter, "r" for example:

In our working directory, we have two directories that starts with r (report, results).

[user_name@localhost]$ tar -cvf archive.tar r*
report/
report/report-2016.pdf
report/report-a.pdf
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4

In this example, we put together the content of the directories results and report into one single archive called archive.tar.

How to see the content of a tar file?

From our previous example, let us consider the tar file results.tar that corresponds all the files and sub-directories in the directory of interest results to see what are the files in it. This can be achieved by invoking the –t option. This gives also additional information about the files like permission, date, owner, etc.

[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name   10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name   10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name       0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name   11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name   11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name       0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name   34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name   34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4

In this example, tar command was invoked with the option t {for list}, v {for virbosity} and {f for file}. This command shows all the files that are in the tar file with additional information about the permission, the date, ownership ....

If you are interested just in listing the files in the tar file, use the following options (tf instead of tvf):

[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4

If you are interested in the number of files in the tar file, it is possible to combine one the previous commands with a pipe { | } and wc -l { word count with the option -l to count only the number of lines}. This command counts the number of lines in the output from the command before the pipe symbol.

[user_name@localhost]$ tar -tvf results.tar | wc -l
9
[user_name@localhost]$ tar -tf results.tar | wc -l
9

From this example, we have a total of 9. This number include all the files and sub-directories that are in the directory results including this directory itself. The options in the previous commands can be invoked separately. For example:

  • The option -tvf is equivalent to -t -v -f
  • The option -v is equivalent to --verbose
  • The option -t is equivalent to -t
  • The option --file=results.tar is equivalent to -f results.tar

Note: The option -f or --file= comes always before the tar file.

How to search for a given file in the tar archive file without un-tarring the archive?

We have seen previously how to list the files in the archive. It also possible to list the files and look at look for a particular file by using the list commands combined with pipe and grep commands. For example, let us see if we can find the file: log.15Feb16.4 (the path to this file is: results/Res-01/log.15Feb16.4).

[user_name@localhost]$ tar -tf results.tar | grep -a log.15Feb16.4
results/Res-01/log.15Feb16.4
[user_name@localhost]$ tar -tvf results.tar | grep -a log.15Feb16.4
-rw-r--r-- name name   11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4

Now, we can try see if we can find another file called for example pbs_file (this file does not exist in our archive):

[user_name@localhost]$ tar -tf results.tar | grep -a pbs_file
[user_name@localhost]$ tar -tvf results.tar | grep -a pbs_file

As you can see, the output of the commands is empty meaning that the file does not exist in the archive. If you want to list all the files that start for example by log in the archive, type on your terminal:

[user_name@localhost]$ tar -tf results.tar | grep -a log*
results/log1.dat
results/log5.dat
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4

Or add the v option for more details:

[user_name@localhost]$ tar -tvf results.tar | grep -a log*
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4

Note: The command more can be also invoked after the pipe symbol to list the files in the archive or the compressed file.

How to append a file or files or add a new file to the end of archive or tar file?

The r option can be used to add files to existing archives, without having to create new ones or extract the archive and run tar again to create the archive. Here is a quick example: let us add the file new.log.dat to the archive results.tar

[user_name@localhost]$ tar -rf results.tar new.log.dat

Here, the tar command added the file new.log.dat at the end of the archive results.tar.

To check out use the previous options to list the files in the tar file:

[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat

Note: Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.

The ‘-r‘ option in the tar command can also be used to append or add a directory or directories to existing tar file. Let’s add report to results.tar from our previous example:

[user_name@localhost]$ tar -rf results.tar report/
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-a.pdf
[user_name@localhost]$ tar -rf results.tar report/
[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf

How to add two archive files with concatenate option?

As we can add a file to archive it is possible to add an archive to another archive. This can be done by invoking the -A option. Let us add the archive report.tar (for the directory report) to the archive results.tar.

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  report.tar  results/  results.tar  tests/  work/
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
-rw-r--r-- name name    10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name    10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ tar -A -f results.tar report.tar
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
-rw-r--r-- name name    10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name    10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-a.pdf

In the above example, we have used the command tar with -A {for append} (tar -A -f results.tar report.tar) to add the archive report.tar to the archive results.tar as you can see from the comparison of output of the command (tar -tvf results.tar) before and after the append operation.

Note: The options -A, --catenate, --concatenate are equivalent.

The previous command can be used as follow:

[user_name@localhost]$ tar -A -f full-results.tar report.tar
[user_name@localhost]$ tar -A --file=full-results.tar report.tar
[user_name@localhost]$ tar --list --file=full-results.tar

How to extract the whole archive?

To extract an archive, we use x {for extract} option with f {for file}; v {for verbosity} can be also added. Let us extract the whole archive results.tar; if we want to extract it in the same directory, we have to make sure that there is no directory with this name otherwise the extracted data go to that directory. It is also possible to extract the archive and redirect to data to another directory. For example we create a directory moved_results and extract the data from the archive results.tar to this directory.

[user_name@localhost]$ tar -xvf results.tar -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf
[MEDUSA@MEDUSA-PC Migration]$  ls new_results/
new.log.dat  report/  results/

How to compress your file (or files), or your tar archive?

From our previous example, we use gzip or bzip2 to compress the files: new.log.dat and results.tar.

  • Using gzip:
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
[user_name@localhost]$ gzip new.log.dat
[user_name@localhost]$ gzip results.tar
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.gz  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
  • Using bzip2:
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
[user_name@localhost]$ bzip2 new.log.dat
[user_name@localhost]$ bzip2 results.tar
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.bz2  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/

In order to compress, use the "z" or "j" option for gzip or bzip respectively.

[user_name@localhost]$ tar -cvzf abc.tar.gz ./new/

The extension of the file name does not really matter. "tar.gz" and tgz are common extensions for files compressed with gzip. ".tar.bz2" and ".tbz" are commonly used extensions for bzip compressed files.

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
[user_name@localhost]$ tar -cvzf results.tar.gz results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ tar -cvjf results.tar.bz2 results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar.bz2  results.tar.gz  tests/  work/

Notes:

  • Another extension tgz can be used instead of tar.gz
  • Another extension tbz can be used instead of tar.bz2

How to exclude particular files or type while creating tar file?

From our previous example, let us create the archive results.tar for the directory results but without the files that have ".dat" as extension. This can be done by adding the option: --exclude=*.dat

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
[user_name@localhost]$ ls results/
log1.dat  log5.dat  Res-01/  Res-02/
[user_name@localhost]$ tar -cvf results.tar results/ --exclude=*.dat
results/
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4

How to uncompress gz files or bz2 files?

The general syntax is:

[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}
[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}

For files with .gz extension, we use gunzip as follow:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.gz  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ gunzip new.log.dat.gz
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/

For files with .bz2 extension, we use bunzip2 as follow:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.bz2  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ bunzip2 new.log.dat.bz2
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/

How to list the content of a compressed file (*.gz or *.bz2)?

As in the case of a tar file we have seen previously, it is possible to combine tar command with z option to list the content of an archive compressed with gzip without uncompressing the file; or j option to list the content of an archive compressed with bzip2 without uncompressing the file.

For gz file:

[user_name@localhost]$ tar -tvzf results.tar.gz
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name anme    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name   924729 2015-11-20 04:14 report/report-a.pdf

For bz2 file:

[user_name@localhost]$ tar -tvjf results.tar.bz2
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4

'Notes:

  • Again, in this example the option v is used to display all detail but not required.
  • The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.

How to restore or extract a compressed archive file in another directory?

As in the case of a tar file, a compressed tar file can be extracted into another directory by using -C to indicate to destination directory and adding the option z for files with gz files; or j for bz2 files. We can use the same example as previously: extract the archive results.tar.gz (or results.tar.bz2) into the directory new_results. It is possible to proceed on two steps or in one step.

  • Extract the compressed archive file on one step:

With gz:

[user_name@localhost]$ tar -xvzf results.tar.gz -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[MEDUSA@MEDUSA-PC Migration]$  tar -xzf results.tar.gz -C new_results/
[MEDUSA@MEDUSA-PC Migration]$  ls new_results/
results/

With bz2 extension:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ tar -xvjf results.tar.bz2 -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/
results/

Notes:

  • In the previous example, it is possible to start with the option -C {the destination directory}, however first make sure that the destination directory exists, since tar is not going to create the directory for you and will fail if it does not exist. The command is:
[user_name@localhost]$ tar -C new_results/ -xzf results.tar.gz
[user_name@localhost]$ tar -C new_results/ -xvjf results.tar.bz2

Notes:

  • If the option -C {destination directory} is not invoked, the files will be extracted in the same directory.
  • The option v is used for verbosity. In this case, it displays the files and directories as they are extracted to the new directory.
  • If we want to display more details (like the date, permission ...), we can add a second v option as follow:
 tar -C new_results/ -xvvzf results.tar.gz
 tar -C new_results/ -xvvjf results.tar.bz2

Extract the compressed archive file on two steps:

Here we use the same command as previously but without the z option or the j option. First we use gunzip or bunzip2 to uncompress the file. Then we use tar -xvf to un-tar the archive as follow:

Let suppose we have the compressed file: results.tar.bz2

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/   Res-02/
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/  Res-02/

For the gz file:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/  Res-02/

How to restore or extract one file from an archive or a compressed archive file in another directory?

Let us consider again the same example as previously. First we create the archive results.tar for the directory archive and list all the files in it. Then we will extract one file into the directory new_results:

[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/                                                                        
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4

In this example, we have extracted the file results/Res-01/log.15Feb16.4 from the archive without uncompressing the whole archive by using the option --extract. The command creates the same directories as in the archive but in the destination directory.

Notes:

  • It is mandatory to use the -C {destination directory} for this command otherwise the command will extract the file to the same directory as the archive created for if it exists. If not, the command will create the same directory.
  • It works to extract a file or a directory but we need to give the right path for the file or directory.
  • The same command can be used to extract multiple files y adding the full path as in the previous example.
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar "results/Res-01/log.15Feb16.4" "file2" "file3"

The same command can also be used to extract a file from a compressed tar file.

From a gz file:

[user_name@localhost]$ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4

From a bz2 file:

[user_name@localhost]$ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4

How to extract multiple files using wildcards?

[user_name@localhost]$ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat

With the above command, we have extracted the files that are in the directory /results and with the extension .dat

Note: The command is also valid with invoking j or z options for compressed archives as we have seen previously.

From our previous example, we can extract all the files that start by log :

How to preserve symbolic links using tar command?

If you have symbolic links in your directory and you want to preserve them, add the option h:

[user_name@localhost]$ tar -cvhf results.tar results/

How to add files to compressed archives (tar.gz/tar.bz2)?

We have already mentioned that we cannot add files to compressed archives. To do so, we need to uncompress the files using gunzip or bunzip2 as seen previously. We obtain the tar file and we add the file or the files to this archive by invoking the r option. The we can compress again using gzip or bzip2.

How to check the size of a file, directory or archive?

From your terminal, you can use the command du -sh [your_file ...] to see the size:

[user_name@localhost]$ du -sh results.tar work tests
112K results.tar
58K  work
48K  tests

By knowing the size of your files or directories, you can decide how to split them of different archives. It is also possible to split an archive file into small files using the split command. A big tar file can also be divided into small parts.

Syntax : split -b <Size-in-MB> <tar-file-name>.<extension> “prefix-name” split -b 100MB results.tar small-res

The option b is invoked to fix the size of the small parts and prefix-name is the name for the small files.

The above command will split the file results.tar into smaller files and the size of each one of them is 100 MB in current working directory and split file names will starts from: small-resaa small-resab small-resac small-resad .... To recover the original file, we use the cat command as follow:

cat small_res* >your_archive_name.tar

Using split command you can divide your large files into smaller part by invoking split with the size you want {-b size in MB} then transfer all the small parts. Once all the small parts are transferred, use the cat command to recover your file or your archive. In case if you want to append numbers in place of alphabets then use ‘-d’ option in above split command.

Reminder of the most used commands

  • Use pwd {present work directory} to see the current working path.
  • Use ls {list} command to see the files and the sub-directories.
  • Use du -sh {disk usage} to see the size of the files, directories and sub-directories.
  • For gzip, bzip2, they also use some free space to create the final archive but in this case the new file you get is your_file.gz if you use gzip or your_file.bz2 if you use bzip2; if it is a tar file; you will get the new file your_archive.tar.gz or your_archive.tar.bz2

The tar command can be applied to multiple files or directories in order to put them together into a final one file archive. tar a directory:

  • $ tar -cvf results.tar results
  • tar more than one directory: $ tar -cvf full_results.tar results report documents
  • tar for example all the files or directories that start with a given a letter, "r" for example: $ tar -cvf archive.tar r*
  • List the content of a tar file including the details: $ tar -tvf results.tar
  • List the content of a tar file without details: $ tar -tf results.tar
  • Count the number of entries in the tar file: $ tar -tvf results.tar | wc -l or $ tar -tf results.tar | wc -l
  • Search for a given file in the tar archive file without un-tarring the archive: $ tar -tf results.tar | grep -a file_name_you_search

or $ tar -tvf results.tar | grep -a file_name_you_search

  • List only file ending or starting by something; for examples files starting with log: $ tar -tf results.tar | grep -a log*

Or: tar -tvf results.tar | grep -a log*

  • How to append a file (for example new.log.dat) or files or add a new file to the end of a tar file ressults.tar): $ tar -rf results.tar new.log.dat

Note: Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.

  • Add a directory to a tar file: $ tar -rf results.tar report/
  • Add one archive to another with concatenate: $ tar -A -f results.tar report.tar
  • Extract the whole archive file: $ tar -xvf results.tar -C new_results/
  • Compress a file (or files), or a tar archive:

Using gzip: $ gzip new.log.dat; $ gzip results.tar

Using bzip2: $ bzip2 new.log.dat; $ bzip2 results.tar

  • Compress with "z" or "j" option for gzip or bzip respectively: $ tar -cvzf results.tar.gz results/; $ tar -cvjf results.tar.bz2 results/

or $ tar -cvzf results.tgz results/; $ tar -cvjf results.tbz results/

  • Exclude particular files or type while creating tar file: $ tar -cvf results.tar results/ --exclude=*.dat
  • Uncompress gz files or bz2: $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}; $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}. For files with .gz extension, we use gunzip as follow: $ gunzip new.log.dat.gz; $ gunzip results.tar.gz; For files with .bz2 extension, we use bunzip2 as follow: $ bunzip2 new.log.dat.bz2; $ bunzip2 results.tar.bz2
  • List the content of a compressed file (*.gz or *.bz2): For gz file: $ tar -tvzf results.tar.gz; For bz2 file: $ tar -tvjf results.tar.bz2

Notes: Again, in this example the option v is used to display all detail but not required. The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.

  • Extract a compressed archive file in another directory: With gz: $ tar -xvzf results.tar.gz -C new_results/ or $ tar -C new_results/ -xvzf results.tar.gz; With bz2 extension: $ tar -xvjf results.tar.bz2 -C new_results/ or $ tar -C new_results/ -xvjf results.tar.bz2
  • Extract the compressed archive file on two steps: For the bz2 file: $ bunzip2 results.tar.bz2; $ tar -C ./new_results/ -xvvf results.tar; For the gz file: $ gunzip results.tar.gz; $ tar -C ./new_results/ -xvvf results.tar
  • Extract one file from an archive or a compressed archive file in another directory: $ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4; $ tar -C ./new_results/ --extract --file=results.tar "file1" "file2" "file3"
  • The same command can also be used to extract a file from a compressed tar file. From a gz file: $ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4; From a bz2 file: $ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
  • Extract multiple files using wildcards: $ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
  • To preserve symbolic links using tar command: $ tar -cvhf results.tar results/
  • Add files to compressed archives (tar.gz/tar.bz2): Uncompress the archive; Add the file; Compress again.
  • Determine the size of the files: $ du -sh results.tar work tests
  • Split a file or a tar file: $ split -b <Size-in-MB> <tar-file-name>.<extension> “prefix-name”; $ split -b 100MB results.tar small-res

Retrieve the original file using cat: $ cat small_res* >your_archive_name.tar