Tar
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Archiving means creating one file that contains a number of smaller files within it. Archiving data can improve the efficiency of file storage, and of file transfers. It is faster for the secure copy protocol (scp), for example, to transfer one archive file of a reasonable size than thousands of small files of equal total size.
Compressing means encoding a file such that the same information is contained in fewer bytes of storage. The advantage for long-term data storage should be obvious. For data transfers, the time spent compressing the data can be balanced against the time saved moving fewer bytes as described in this discussion of data compression and transfer from the US National Center for Supercomputing Applications.
Use tar to archive files and directories
The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar
as the file name extension.
When you archive a directory with tar
, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So
[name@server ~]$ tar --create --file project1.tar project1
will pack all the contents of directory project1/
into the file project1.tar
. The original directory will be unchanged, so this may double the amount of disk space occupied!
You can extract files from the archive using the same command with a different option:
[name@server ~]$ tar --extract --file project1.tar
If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten.
How to compress and uncompress tar files
tar
can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz
or gzip
, which can be used like so:
[name@server ~]$ tar --create --xz --file project1.tar.xz project1
[name@server ~]$ tar --extract --xz --file project1.tar.xz
[name@server ~]$ tar --create --gzip --file project1.tar.gz project1
[name@server ~]$ tar --extract --gzip --file project1.tar.gz
Typically, --xz
will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [1]. --gzip
does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create
.
You can also run tar --create
first without compression and then use the commands xz
or gzip
in a separate step, although there is rarely a reason to do so. Similarly you can run xz -d
or gzip -d
to decompress an archive file before running tar --extract
, but again there is rarely a reason to do so.
Common tar options
These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:
-c
or--create
: Create a new archive.-f
or--file=
: Following is the archive file name.-x
or--extract
: Extract files from archive.-t
or--list
: List the contents of an archive file.-J
or--xz
: Compress or uncompress withxz
.-z
or--gzip
: Compress or uncompress withgzip
.
Single-letter options can be combined with a single dash, so for example
[name@server ~]$ tar -cJf project1.tar.zx project1
is equivalent to
[name@server ~]$ tar --create --xz --file=project1.tar.xz project1
There are many more options for tar
, and may depend on the version you are using. You can get a complete list of the options available on your system with man tar
or tar --help
. Note in particular that some older systems might not support --xz
compression.
How to tar a given directory?
Now, we can go back to our test example and try to create an archive called results.tar for the directory results; on your terminal type:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ tests/ work/
Then:
[user_name@localhost]$ tar -cvf results.tar results
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
Using ls
command we can see the tar file created:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ results.tar tests/ work/
In this example, we have invoked the tar
command with the options c {for create}, v {for verbosity} and f {for file}. As a name for the archive, we have used results.tar; this name can be something else but it is better to keep similar name as the file or directory we want to tar. It is easier to recognize your data later without having to uncompress them to see what data you have in this file.
If we want to add more directories to a tar file; for example, an archive file called full_results.tar that for the directories results, reports and documents, we can proceed as follow:
[user_name@localhost]$ tar -cvf full_results.tar results report documents/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
report/
report/report-2016.pdf
report/report-a.pdf
documents/
documents/1504.pdf
documents/ff.doc
[user_name@localhost]$ ls
bin/ documents/ full_results.tar jobs/ new.log.dat programs/ report/ results/ results.tar tests/ work/
How to tar for example all the files or directories that start with a given a letter, "r" for example:
In our working directory, we have two directories that starts with r (report, results).
[user_name@localhost]$ tar -cvf archive.tar r*
report/
report/report-2016.pdf
report/report-a.pdf
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
In this example, we put together the content of the directories results and report into one single archive called archive.tar.
How to see the content of a tar file?
From our previous example, let us consider the tar file results.tar that corresponds all the files and sub-directories in the directory of interest results to see what are the files in it. This can be achieved by invoking the –t option. This gives also additional information about the files like permission, date, owner, etc.
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
In this example, tar command was invoked with the option t {for list}, v {for virbosity} and {f for file}. This command shows all the files that are in the tar file with additional information about the permission, the date, ownership ....
If you are interested just in listing the files in the tar file, use the following options (tf instead of tvf):
[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
If you are interested in the number of files in the tar file, it is possible to combine one the previous commands with a pipe { | } and wc -l { word count with the option -l to count only the number of lines}. This command counts the number of lines in the output from the command before the pipe symbol.
[user_name@localhost]$ tar -tvf results.tar | wc -l
9
[user_name@localhost]$ tar -tf results.tar | wc -l
9
From this example, we have a total of 9. This number include all the files and sub-directories that are in the directory results including this directory itself. The options in the previous commands can be invoked separately. For example:
- The option -tvf is equivalent to -t -v -f
- The option -v is equivalent to --verbose
- The option -t is equivalent to -t
- The option --file=results.tar is equivalent to -f results.tar
Note: The option -f or --file= comes always before the tar file.
How to search for a given file in the tar archive file without un-tarring the archive?
We have seen previously how to list the files in the archive. It also possible to list the files and look at look for a particular file by using the list commands combined with pipe and grep commands. For example, let us see if we can find the file: log.15Feb16.4 (the path to this file is: results/Res-01/log.15Feb16.4).
[user_name@localhost]$ tar -tf results.tar | grep -a log.15Feb16.4
results/Res-01/log.15Feb16.4
[user_name@localhost]$ tar -tvf results.tar | grep -a log.15Feb16.4
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
Now, we can try see if we can find another file called for example pbs_file (this file does not exist in our archive):
[user_name@localhost]$ tar -tf results.tar | grep -a pbs_file
[user_name@localhost]$ tar -tvf results.tar | grep -a pbs_file
As you can see, the output of the commands is empty meaning that the file does not exist in the archive. If you want to list all the files that start for example by log in the archive, type on your terminal:
[user_name@localhost]$ tar -tf results.tar | grep -a log*
results/log1.dat
results/log5.dat
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
Or add the v option for more details:
[user_name@localhost]$ tar -tvf results.tar | grep -a log*
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
Note: The command more can be also invoked after the pipe symbol to list the files in the archive or the compressed file.
How to append a file or files or add a new file to the end of archive or tar file?
The r option can be used to add files to existing archives, without having to create new ones or extract the archive and run tar again to create the archive. Here is a quick example: let us add the file new.log.dat to the archive results.tar
[user_name@localhost]$ tar -rf results.tar new.log.dat
Here, the tar command added the file new.log.dat at the end of the archive results.tar.
To check out use the previous options to list the files in the tar file:
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name 10905 2016-11-20 11:16 new.log.dat
Note: Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.
The ‘-r‘ option in the tar command can also be used to append or add a directory or directories to existing tar file. Let’s add report to results.tar from our previous example:
[user_name@localhost]$ tar -rf results.tar report/
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name 10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name 0 2016-11-20 11:02 report/
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-a.pdf
[user_name@localhost]$ tar -rf results.tar report/
[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf
How to add two archive files with concatenate option?
As we can add a file to archive it is possible to add an archive to another archive. This can be done by invoking the -A option. Let us add the archive report.tar (for the directory report) to the archive results.tar.
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ report.tar results/ results.tar tests/ work/
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name 0 2016-11-20 16:16 results/
-rw-r--r-- name name 10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name 10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name 34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ tar -A -f results.tar report.tar
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name 0 2016-11-20 16:16 results/
-rw-r--r-- name name 10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name 10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name 34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
drwxrwxr-x name name 0 2016-11-20 11:02 report/
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-a.pdf
In the above example, we have used the command tar with -A {for append} (tar -A -f results.tar report.tar) to add the archive report.tar to the archive results.tar as you can see from the comparison of output of the command (tar -tvf results.tar) before and after the append operation.
Note: The options -A, --catenate, --concatenate are equivalent.
The previous command can be used as follow:
[user_name@localhost]$ tar -A -f full-results.tar report.tar
[user_name@localhost]$ tar -A --file=full-results.tar report.tar
[user_name@localhost]$ tar --list --file=full-results.tar
How to extract the whole archive?
To extract an archive, we use x {for extract} option with f {for file}; v {for verbosity} can be also added. Let us extract the whole archive results.tar; if we want to extract it in the same directory, we have to make sure that there is no directory with this name otherwise the extracted data go to that directory. It is also possible to extract the archive and redirect to data to another directory. For example we create a directory moved_results and extract the data from the archive results.tar to this directory.
[user_name@localhost]$ tar -xvf results.tar -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf
[MEDUSA@MEDUSA-PC Migration]$ ls new_results/
new.log.dat report/ results/
How to compress your file (or files), or your tar archive?
From our previous example, we use gzip or bzip2 to compress the files: new.log.dat and results.tar.
- Using gzip:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar tests/ work/
[user_name@localhost]$ gzip new.log.dat
[user_name@localhost]$ gzip results.tar
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat.gz new_results/ programs/ report/ results/ results.tar.gz tests/ work/
- Using bzip2:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar tests/ work/
[user_name@localhost]$ bzip2 new.log.dat
[user_name@localhost]$ bzip2 results.tar
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat.bz2 new_results/ programs/ report/ results/ results.tar.bz2 tests/ work/
In order to compress, use the "z" or "j" option for gzip or bzip respectively.
[user_name@localhost]$ tar -cvzf abc.tar.gz ./new/
The extension of the file name does not really matter. "tar.gz" and tgz are common extensions for files compressed with gzip. ".tar.bz2" and ".tbz" are commonly used extensions for bzip compressed files.
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ tests/ work/
[user_name@localhost]$ tar -cvzf results.tar.gz results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ results.tar.gz tests/ work/
[user_name@localhost]$ tar -cvjf results.tar.bz2 results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ results.tar.bz2 results.tar.gz tests/ work/
Notes:
- Another extension tgz can be used instead of tar.gz
- Another extension tbz can be used instead of tar.bz2
How to exclude particular files or type while creating tar file?
From our previous example, let us create the archive results.tar for the directory results but without the files that have ".dat" as extension. This can be done by adding the option: --exclude=*.dat
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat programs/ report/ results/ tests/ work/
[user_name@localhost]$ ls results/
log1.dat log5.dat Res-01/ Res-02/
[user_name@localhost]$ tar -cvf results.tar results/ --exclude=*.dat
results/
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name 0 2016-11-20 16:16 results/
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name 0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name 34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
How to uncompress gz files or bz2 files?
The general syntax is:
[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}
[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}
For files with .gz extension, we use gunzip as follow:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat.gz new_results/ programs/ report/ results/ results.tar.gz tests/ work/
[user_name@localhost]$ gunzip new.log.dat.gz
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar tests/ work/
For files with .bz2 extension, we use bunzip2 as follow:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat.bz2 new_results/ programs/ report/ results/ results.tar.bz2 tests/ work/
[user_name@localhost]$ bunzip2 new.log.dat.bz2
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar tests/ work/
How to list the content of a compressed file (*.gz or *.bz2)?
As in the case of a tar file we have seen previously, it is possible to combine tar command with z option to list the content of an archive compressed with gzip without uncompressing the file; or j option to list the content of an archive compressed with bzip2 without uncompressing the file.
For gz file:
[user_name@localhost]$ tar -tvzf results.tar.gz
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name anme 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name 10905 2016-11-20 11:16 new.log.dat
-rw-r--r-- name name 10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name 0 2016-11-20 11:02 report/
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name 924729 2015-11-20 04:14 report/report-a.pdf
For bz2 file:
[user_name@localhost]$ tar -tvjf results.tar.bz2
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name 11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
'Notes:
- Again, in this example the option v is used to display all detail but not required.
- The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.
How to restore or extract a compressed archive file in another directory?
As in the case of a tar file, a compressed tar file can be extracted into another directory by using -C to indicate to destination directory and adding the option z for files with gz files; or j for bz2 files. We can use the same example as previously: extract the archive results.tar.gz (or results.tar.bz2) into the directory new_results. It is possible to proceed on two steps or in one step.
- Extract the compressed archive file on one step:
With gz:
[user_name@localhost]$ tar -xvzf results.tar.gz -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[MEDUSA@MEDUSA-PC Migration]$ tar -xzf results.tar.gz -C new_results/
[MEDUSA@MEDUSA-PC Migration]$ ls new_results/
results/
With bz2 extension:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar.bz2 tests/ work/
[user_name@localhost]$ tar -xvjf results.tar.bz2 -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/
results/
Notes:
- In the previous example, it is possible to start with the option -C {the destination directory}, however first make sure that the destination directory exists, since tar is not going to create the directory for you and will fail if it does not exist. The command is:
[user_name@localhost]$ tar -C new_results/ -xzf results.tar.gz
[user_name@localhost]$ tar -C new_results/ -xvjf results.tar.bz2
Notes:
- If the option -C {destination directory} is not invoked, the files will be extracted in the same directory.
- The option v is used for verbosity. In this case, it displays the files and directories as they are extracted to the new directory.
- If we want to display more details (like the date, permission ...), we can add a second v option as follow:
tar -C new_results/ -xvvzf results.tar.gz tar -C new_results/ -xvvjf results.tar.bz2
Extract the compressed archive file on two steps:
Here we use the same command as previously but without the z option or the j option. First we use gunzip or bunzip2 to uncompress the file. Then we use tar -xvf to un-tar the archive as follow:
Let suppose we have the compressed file: results.tar.bz2
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar.bz2 tests/ work/
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat log5.dat Res-01/ Res-02/
[user_name@localhost]$ ls new_results/results/
log1.dat log5.dat Res-01/ Res-02/
For the gz file:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar.gz tests/ work/
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat log5.dat Res-01/ Res-02/
How to restore or extract one file from an archive or a compressed archive file in another directory?
Let us consider again the same example as previously. First we create the archive results.tar for the directory archive and list all the files in it. Then we will extract one file into the directory new_results:
[user_name@localhost]$ ls
bin/ documents/ jobs/ new.log.dat new_results/ programs/ report/ results/ results.tar tests/ work/
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name 0 2016-11-20 11:02 results/
-rw-r--r-- name name 10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name 10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name 0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name 11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name 0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name 34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name 34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
In this example, we have extracted the file results/Res-01/log.15Feb16.4 from the archive without uncompressing the whole archive by using the option --extract. The command creates the same directories as in the archive but in the destination directory.
Notes:
- It is mandatory to use the -C {destination directory} for this command otherwise the command will extract the file to the same directory as the archive created for if it exists. If not, the command will create the same directory.
- It works to extract a file or a directory but we need to give the right path for the file or directory.
- The same command can be used to extract multiple files y adding the full path as in the previous example.
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar "results/Res-01/log.15Feb16.4" "file2" "file3"
The same command can also be used to extract a file from a compressed tar file.
From a gz file:
[user_name@localhost]$ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
From a bz2 file:
[user_name@localhost]$ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
How to extract multiple files using wildcards?
[user_name@localhost]$ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
[user_name@localhost]$ ls new_results/results/
log1.dat log5.dat
With the above command, we have extracted the files that are in the directory /results and with the extension .dat
Note: The command is also valid with invoking j or z options for compressed archives as we have seen previously.
From our previous example, we can extract all the files that start by log :
How to preserve symbolic links using tar command?
If you have symbolic links in your directory and you want to preserve them, add the option h:
[user_name@localhost]$ tar -cvhf results.tar results/
How to add files to compressed archives (tar.gz/tar.bz2)?
We have already mentioned that we cannot add files to compressed archives. To do so, we need to uncompress the files using gunzip or bunzip2 as seen previously. We obtain the tar file and we add the file or the files to this archive by invoking the r option. The we can compress again using gzip or bzip2.
How to check the size of a file, directory or archive?
From your terminal, you can use the command du -sh [your_file ...] to see the size:
[user_name@localhost]$ du -sh results.tar work tests
112K results.tar
58K work
48K tests
By knowing the size of your files or directories, you can decide how to split them of different archives. It is also possible to split an archive file into small files using the split command. A big tar file can also be divided into small parts.
Syntax : split -b <Size-in-MB> <tar-file-name>.<extension> “prefix-name” split -b 100MB results.tar small-res
The option b is invoked to fix the size of the small parts and prefix-name is the name for the small files.
The above command will split the file results.tar into smaller files and the size of each one of them is 100 MB in current working directory and split file names will starts from: small-resaa small-resab small-resac small-resad .... To recover the original file, we use the cat command as follow:
cat small_res* >your_archive_name.tar
Using split command you can divide your large files into smaller part by invoking split with the size you want {-b size in MB} then transfer all the small parts. Once all the small parts are transferred, use the cat command to recover your file or your archive. In case if you want to append numbers in place of alphabets then use ‘-d’ option in above split command.
Reminder of the most used commands
- Use pwd {present work directory} to see the current working path.
- Use ls {list} command to see the files and the sub-directories.
- Use du -sh {disk usage} to see the size of the files, directories and sub-directories.
- For gzip, bzip2, they also use some free space to create the final archive but in this case the new file you get is your_file.gz if you use gzip or your_file.bz2 if you use bzip2; if it is a tar file; you will get the new file your_archive.tar.gz or your_archive.tar.bz2
The tar command can be applied to multiple files or directories in order to put them together into a final one file archive. tar a directory:
- $ tar -cvf results.tar results
- tar more than one directory: $ tar -cvf full_results.tar results report documents
- tar for example all the files or directories that start with a given a letter, "r" for example: $ tar -cvf archive.tar r*
- List the content of a tar file including the details: $ tar -tvf results.tar
- List the content of a tar file without details: $ tar -tf results.tar
- Count the number of entries in the tar file: $ tar -tvf results.tar | wc -l or $ tar -tf results.tar | wc -l
- Search for a given file in the tar archive file without un-tarring the archive: $ tar -tf results.tar | grep -a file_name_you_search
or $ tar -tvf results.tar | grep -a file_name_you_search
- List only file ending or starting by something; for examples files starting with log: $ tar -tf results.tar | grep -a log*
Or: tar -tvf results.tar | grep -a log*
- How to append a file (for example new.log.dat) or files or add a new file to the end of a tar file ressults.tar): $ tar -rf results.tar new.log.dat
Note: Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.
- Add a directory to a tar file: $ tar -rf results.tar report/
- Add one archive to another with concatenate: $ tar -A -f results.tar report.tar
- Extract the whole archive file: $ tar -xvf results.tar -C new_results/
- Compress a file (or files), or a tar archive:
Using gzip: $ gzip new.log.dat; $ gzip results.tar
Using bzip2: $ bzip2 new.log.dat; $ bzip2 results.tar
- Compress with "z" or "j" option for gzip or bzip respectively: $ tar -cvzf results.tar.gz results/; $ tar -cvjf results.tar.bz2 results/
or $ tar -cvzf results.tgz results/; $ tar -cvjf results.tbz results/
- Exclude particular files or type while creating tar file: $ tar -cvf results.tar results/ --exclude=*.dat
- Uncompress gz files or bz2: $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}; $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}. For files with .gz extension, we use gunzip as follow: $ gunzip new.log.dat.gz; $ gunzip results.tar.gz; For files with .bz2 extension, we use bunzip2 as follow: $ bunzip2 new.log.dat.bz2; $ bunzip2 results.tar.bz2
- List the content of a compressed file (*.gz or *.bz2): For gz file: $ tar -tvzf results.tar.gz; For bz2 file: $ tar -tvjf results.tar.bz2
Notes: Again, in this example the option v is used to display all detail but not required. The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.
- Extract a compressed archive file in another directory: With gz: $ tar -xvzf results.tar.gz -C new_results/ or $ tar -C new_results/ -xvzf results.tar.gz; With bz2 extension: $ tar -xvjf results.tar.bz2 -C new_results/ or $ tar -C new_results/ -xvjf results.tar.bz2
- Extract the compressed archive file on two steps: For the bz2 file: $ bunzip2 results.tar.bz2; $ tar -C ./new_results/ -xvvf results.tar; For the gz file: $ gunzip results.tar.gz; $ tar -C ./new_results/ -xvvf results.tar
- Extract one file from an archive or a compressed archive file in another directory: $ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4; $ tar -C ./new_results/ --extract --file=results.tar "file1" "file2" "file3"
- The same command can also be used to extract a file from a compressed tar file. From a gz file: $ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4; From a bz2 file: $ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
- Extract multiple files using wildcards: $ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
- To preserve symbolic links using tar command: $ tar -cvhf results.tar results/
- Add files to compressed archives (tar.gz/tar.bz2): Uncompress the archive; Add the file; Compress again.
- Determine the size of the files: $ du -sh results.tar work tests
- Split a file or a tar file: $ split -b <Size-in-MB> <tar-file-name>.<extension> “prefix-name”; $ split -b 100MB results.tar small-res
Retrieve the original file using cat: $ cat small_res* >your_archive_name.tar