Tar: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(former content copied to docs-dev)
Tag: New redirect
 
(54 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
#REDIRECT [[A tutorial on 'tar']]
=== This page is not finished yet ===
 
<languages />
 
<translate>
<!--T:1-->
[https://en.wikipedia.org/wiki/Archive_file Archiving] means creating one file that contains a number of smaller files within it. Archiving data can improve the efficiency of file storage, and of file transfers. It is faster for the secure copy protocol ([https://en.wikipedia.org/wiki/Secure_copy scp]), for example, to transfer one archive file of a reasonable size than thousands of small files of equal total size.
 
<!--T:2-->
[https://en.wikipedia.org/wiki/Data_compression Compressing] means encoding a file such that the same information is contained in fewer bytes of storage. The advantage for long-term data storage should be obvious. For data transfers, the time spent compressing the data can be balanced against the time saved moving fewer bytes as described in this discussion of [https://bluewaters.ncsa.illinois.edu/data-transfer-doc data compression and transfer] from the US National Center for Supercomputing Applications.
 
=== Use tar to archive files and directories === <!--T:3-->
The primary archiving utility on all Linux and Unix-like systems is the [https://www.gnu.org/software/tar/manual/tar.html tar] command. It will bundle a bunch of files or directories together and generate a single file, called an ''archive file'' or ''tar-file''. By convention an archive file has <code>.tar</code> as the file name extension.
When you archive a directory with <code>tar</code>, it will by default include all files and sub-directories contained in it, and sub-sub-directories contained in those, and so on. So
{{Command|tar --create --file project1.tar project1}}
will pack all the contents of directory <code>project1/</code> into the file <code>project1.tar</code>. The original directory will be unchanged, so this may double the amount of disk space occupied!
 
<!--T:4-->
You can extract files from the archive using the same command with a different option:
{{Command|tar --extract --file project1.tar}}
If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten.
 
=== How to compress and uncompress tar files === <!--T:5-->
<code>tar</code> can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either '''<code>xz</code>''' or '''<code>gzip</code>''', which can be used like so:
{{Commands|tar --create --xz --file project1.tar.xz project1
|tar --extract --xz --file project1.tar.xz
|tar --create --gzip --file project1.tar.gz project1
|tar --extract --gzip --file project1.tar.gz}}
Typically, <code>--xz</code> will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working [http://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO]. <code>--gzip</code> does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during <code>tar --create</code>.
 
<!--T:6-->
You can also run <code>tar --create</code> first without compression and then use the commands <code>xz</code> or <code>gzip</code> in a separate step, although there is rarely a reason to do so. Similarly you can run <code>xz -d</code> or <code>gzip -d</code> to decompress an archive file before running <code>tar --extract</code>, but again there is rarely a reason to do so.
 
=== Common tar options === <!--T:7-->
These are the most common options for tar command. There are two synonymous forms for each, a single-letter form prefixed with a single dash, and a whole-word form prefixed with a double dash:
* <code>-c</code> or <code>--create</code>: Create a new archive.
* <code>-f</code> or <code>--file=</code>: Following is the archive file name.
* <code>-x</code> or <code>--extract</code>: Extract files from archive.
* <code>-t</code> or <code>--list</code>: List the contents of an archive file.
* <code>-J</code> or <code>--xz</code>: Compress or uncompress with <code>xz</code>.
* <code>-z</code> or <code>--gzip</code>: Compress or uncompress with <code>gzip</code>.
Single-letter options can be combined with a single dash, so for example
{{Command|tar -cJf project1.tar.zx project1}}
is equivalent to
{{Command|tar --create --xz --file{{=}}project1.tar.xz project1}}
 
<!--T:8-->
There are many more options for <code>tar</code>, and may depend on the version you are using. You can get a complete list of the options available on your system with <code>man tar</code> or <code>tar --help</code>. Note in particular that some older systems might not support <code>--xz</code> compression.
 
</translate>
 
<!--- Maybe there should be a section of examples here, but not specifically about migration.
Migration-specific advice should move to "General directives for migration"
 
== Common and useful commands to use to prepare your archives: ==
To illustrate the different commands and how to use archive utilities, we use a given directory that looks like a home directory or any other directory that contains files, sub-directories ... etc. Let us suppose that you have already cleaned and removed the data you do not need and your data is ready for migration. Before that, there is one more step which is to compress your data. In the following, you will find the most common use of archiving and compressing utilities with adequate options. As an example, we use one directory called here '''Migration''' (or whatever is the name of your directory) and see how we can apply the different archiving and compressing utilities.
On your terminal, change the directory to '''Migration''' (or the directory you want to work with) then: <br>
* Use pwd {present work directory} to see the current working path.
* Use ls {list} command to see the files and the sub-directories in the current working path.
* Use du -sh {disk usage} to see the size of the files, directories and sub-directories. This information will help you to see how to prepare your archives and which files to put together or to compress separately.
As shown in this example:
 
<source lang="console">
[user_name@localhost]$  pwd
/global/scratch/user_name/Migration
</source>
 
<source lang="console">
[user_name@localhost]$  ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
</source>
 
<source lang="console">
[user_name@localhost]$  du -sh *
3,0K bin
876K documents
136K jobs
12K  new.log.dat
68K  programs
1,8M report
120K results
48K  tests
46K  work
</source>
 
This example shows that we are currently working on the directory called '''Migration''' and it contains the following files and directories ('''bin''', '''documents''', '''jobs''', '''new.log.dat''', '''programs''', '''report''', '''results''', '''tests''', '''work'''). The size of each file or directory is given by the use of the command <code>du -sh</code>. We will explain later why it is important to use this command to determine the size of your files or directories before starting compression.
 
In this example, we have used few directories and small files. In your case, you may have more directories, more files and large files. But the idea is the same and it consists on creating your archives using the <code>tar</code>, <code>gzip</code>, <code>bzip2</code> from your terminal. You can recover them later by <code>tar</code> {with specific options}, <code>gunzip</code> and <code>bunzip2</code>. We will explain how these utilities work by giving the most common and used commands and options.
 
'''Notes:'''
* Before starting compression, make sure you are not running out of space or quota because the tar command uses the free space to create the archive. At the end, it is like you have added data with the same size as the file or the directory you are trying to tar. When using tar, the original file stays without any change unless you make changes later or remove it.
* For gzip and bzip2, they also use some free space to create the final archive but in this case the new file you will get is '''your_file.gz''' if you use gzip or '''your_file.bz2''' if you use bzip2; if it is a tar file; you will get the new file '''your_archive.tar.gz''' or '''your_archive.tar.bz2'''
* The tar command can be applied to multiple files or directories in order to put them together into a final one file archive.
* The gzip and bzip2 are applied to a single file or a single archive file but not a directory.
--->
 
=== How to tar a given directory? ===
 
Now, we can go back to our test example and try to create an archive called '''results.tar''' for the directory '''results'''; on your terminal type:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
</source>
 
Then:
 
<source lang="console">
[user_name@localhost]$ tar -cvf results.tar results
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
 
Using <code>ls</code> command we can see the tar file created:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar  tests/  work/
</source>
 
In this example, we have invoked the <code>tar</code> command with the options '''c''' {for create}, '''v''' {for verbosity} and '''f''' {for file}. As a name for the archive, we have used '''results.tar'''; this name can be something else but it is better to keep similar name as the file or directory we want to '''tar'''. It is easier to recognize your data later without having to uncompress them to see what data you have in this file.
 
If we want to add more directories to a tar file; for example, an archive file called '''full_results.tar''' that for the directories '''results''', '''reports''' and '''documents''', we can proceed as follow:
 
<source lang="console">
[user_name@localhost]$ tar -cvf full_results.tar results report documents/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
report/
report/report-2016.pdf
report/report-a.pdf
documents/
documents/1504.pdf
documents/ff.doc
</source>
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  full_results.tar  jobs/  new.log.dat  programs/  report/  results/  results.tar  tests/  work/
</source>
=== How to tar for example all the files or directories that start with a given a letter, "r" for example: ===
 
In our working directory, we have two directories that starts with r (report, results).
 
<source lang="console">
[user_name@localhost]$ tar -cvf archive.tar r*
report/
report/report-2016.pdf
report/report-a.pdf
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
 
In this example, we put together the content of the directories '''results''' and '''report''' into one single archive called '''archive.tar'''.
 
=== How to see the content of a tar file? ===
 
From our previous example, let us consider the tar file '''results.tar''' that corresponds all the files and sub-directories in the directory of interest results to see what are the files in it. This can be achieved by invoking the '''–t''' option. This gives also additional information about the files like permission, date, owner, etc.
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name  10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name  10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name      0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name  11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name  11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name      0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name  34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name  34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
</source>
 
In this example, tar command was invoked with the option t {for list}, v {for virbosity} and {f for file}. This command shows all the files that are in the tar file with additional information about the permission, the date, ownership ....
 
If you are interested just in listing the files in the tar file, use the following options (tf instead of tvf):
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
 
If you are interested in the number of files in the tar file, it is possible to combine one the previous commands with a pipe { | } and wc -l { word count with the option -l to count only the number of lines}. This command counts the number of lines in the output from the command before the pipe symbol.
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar | wc -l
9
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar | wc -l
9
</source>
From this example, we have a total of 9. This number include all the files and sub-directories that are in the directory results including this directory itself.
The options in the previous commands can be invoked separately. For example:
* The option -tvf is equivalent to -t -v -f
* The option -v is equivalent to --verbose
* The option -t is equivalent to -t
* The option --file=results.tar is equivalent to -f results.tar
 
'''Note:''' The option -f or --file= comes always before the tar file.
=== How to search for a given file in the tar archive file without un-tarring the archive? ===
 
We have seen previously how to list the files in the archive. It also possible to list the files and look at look for a particular file by using the list commands combined with pipe and grep commands. For example, let us see if we can find the file: '''log.15Feb16.4''' (the path to this file is: '''results/Res-01/log.15Feb16.4)'''.
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar | grep -a log.15Feb16.4
results/Res-01/log.15Feb16.4
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar | grep -a log.15Feb16.4
-rw-r--r-- name name  11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
</source>
Now, we can try see if we can find another file called for example pbs_file (this file does not exist in our archive): 
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar | grep -a pbs_file
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar | grep -a pbs_file
</source>
As you can see, the output of the commands is empty meaning that the file does not exist in the archive. If you want to list all the files that start for example by log in the archive, type on your terminal:
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar | grep -a log*
results/log1.dat
results/log5.dat
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
Or add the v option for more details:
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar | grep -a log*
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
</source>
 
'''Note:''' The command more can be also invoked after the pipe symbol to list the files in the archive or the compressed file.
 
=== How to append a file or files or add a new file to the end of archive or tar file? ===
 
The r option can be used to add files to existing archives, without having to create new ones or extract the archive and run tar again to create the archive. Here is a quick example: let us add the file '''new.log.dat''' to the archive '''results.tar'''
 
<source lang="console">
[user_name@localhost]$ tar -rf results.tar new.log.dat
</source>
 
Here, the tar command added the file '''new.log.dat''' at the end of the archive '''results.tar'''.
 
To check out use the previous options to list the files in the tar file:
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
</source>
Note: Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.
 
The ‘-r‘ option in the tar command can also be used to append or add a directory or directories to existing tar file. Let’s add report to results.tar from our previous example:
 
<source lang="console">
[user_name@localhost]$ tar -rf results.tar report/
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-a.pdf
</source>
<source lang="console">
[user_name@localhost]$ tar -rf results.tar report/
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tf results.tar
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf
</source>
 
=== How to add two archive files with concatenate option? ===
 
As we can add a file to archive it is possible to add an archive to another archive. This can be done by invoking the -A option. Let us add the archive report.tar (for the directory report) to the archive results.tar.
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  report.tar  results/  results.tar  tests/  work/
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
-rw-r--r-- name name    10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name    10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
</source>
 
<source lang="console">
[user_name@localhost]$ tar -A -f results.tar report.tar
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
-rw-r--r-- name name    10905 2016-11-20 16:16 results/log1.dat
-rw-r--r-- name name    10909 2016-11-20 16:16 results/log5.dat
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-a.pdf
</source>
In the above example, we have used the command tar with -A {for append} (tar -A -f results.tar report.tar) to add the archive report.tar to the archive results.tar as you can see from the comparison of output of the command (tar -tvf results.tar) before and after the append operation.
 
'''Note:''' The options -A, --catenate, --concatenate are equivalent.
 
The previous command can be used as follow:
 
<source lang="console">
[user_name@localhost]$ tar -A -f full-results.tar report.tar
</source>
 
<source lang="console">
[user_name@localhost]$ tar -A --file=full-results.tar report.tar
</source>
 
<source lang="console">
[user_name@localhost]$ tar --list --file=full-results.tar
</source>
=== How to extract the whole archive? ===
 
To extract an archive, we use x {for extract} option with f {for file}; v {for verbosity} can be also added. Let us extract the whole archive results.tar; if we want to extract it in the same directory, we have to make sure that there is no directory with this name otherwise the extracted data go to that directory. It is also possible to extract the archive and redirect to data to another directory. For example we create a directory moved_results and extract the data from the archive results.tar to this directory.
 
<source lang="console">
[user_name@localhost]$ tar -xvf results.tar -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
new.log.dat
report/
report/report-2016.pdf
report/report-a.pdf
[MEDUSA@MEDUSA-PC Migration]$  ls new_results/
new.log.dat  report/  results/
</source>
 
=== How to compress your file (or files), or your tar archive? ===
 
From our previous example, we use gzip or bzip2 to compress the files: '''new.log.dat''' and '''results.tar'''.
 
* Using gzip:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
[user_name@localhost]$ gzip new.log.dat
[user_name@localhost]$ gzip results.tar
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.gz  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
</source>
* Using bzip2:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
[user_name@localhost]$ bzip2 new.log.dat
[user_name@localhost]$ bzip2 results.tar
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.bz2  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
</source>
In order to compress, use the "z" or "j" option for gzip or bzip respectively.
 
<source lang="console">
[user_name@localhost]$ tar -cvzf abc.tar.gz ./new/
</source>
 
The extension of the file name does not really matter. "tar.gz" and tgz are common extensions for files compressed with gzip. ".tar.bz2" and ".tbz" are commonly used extensions for bzip compressed files.
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
[user_name@localhost]$ tar -cvzf results.tar.gz results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ tar -cvjf results.tar.bz2 results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
</source>
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  results.tar.bz2  results.tar.gz  tests/  work/
</source>
'''Notes:'''
* Another extension tgz can be used instead of tar.gz
* Another extension tbz can be used instead of tar.bz2
 
=== How to exclude particular files or type while creating tar file? ===
From our previous example, let us create the archive '''results.tar''' for the directory results but without the files that have "'''.dat'''" as extension. This can be done by adding the option: --exclude=*.dat
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  programs/  report/  results/  tests/  work/
[user_name@localhost]$ ls results/
log1.dat  log5.dat  Res-01/  Res-02/
[user_name@localhost]$ tar -cvf results.tar results/ --exclude=*.dat
results/
results/Res-01/
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ tar -tvf results.tar
drwxr-xr-x name name        0 2016-11-20 16:16 results/
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-20 16:16 results/Res-01/log.15Feb16.4
drwxr-xr-x name name        0 2016-11-20 16:16 results/Res-02/
-rw-r--r-- name name    34111 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-20 16:16 results/Res-02/log.15Feb16.balance.b.4
</source>
 
=== How to uncompress gz files or bz2 files? ===
 
The general syntax is:
 
<source lang="console">
[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}
</source>
 
<source lang="console">
[user_name@localhost]$ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}
</source>
 
For files with .gz extension, we use gunzip as follow:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.gz  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ gunzip new.log.dat.gz
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
</source>
For files with .bz2 extension, we use bunzip2 as follow:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat.bz2  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ bunzip2 new.log.dat.bz2
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
</source>
=== How to list the content of a compressed file (*.gz or *.bz2)? ===
 
As in the case of a tar file we have seen previously, it is possible to combine tar command with z option to list the content of an archive compressed with gzip without uncompressing the file; or j option to list the content of an archive compressed with bzip2 without uncompressing the file.
 
For gz file:
 
<source lang="console">
[user_name@localhost]$ tar -tvzf results.tar.gz
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name anme    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
-rw-r--r-- name name    10905 2016-11-20 11:16 new.log.dat
drwxrwxr-x name name        0 2016-11-20 11:02 report/
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-2016.pdf
-rw-r--r-- name name  924729 2015-11-20 04:14 report/report-a.pdf
</source>
For bz2 file:
 
<source lang="console">
[user_name@localhost]$ tar -tvjf results.tar.bz2
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-16 19:36 results/Res-01/
-rw-r--r-- name name    11672 2016-11-16 15:10 results/Res-01/log.15Feb16.1
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
</source>
 
'''Notes:''
* Again, in this example the option v is used to display all detail but not required.
* The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.
 
=== How to restore or extract a compressed archive file in another directory? ===
 
As in the case of a tar file, a compressed tar file can be extracted into another directory by using -C to indicate to destination directory and adding the option z for files with gz files; or j for bz2 files. We can use the same example as previously: extract the archive results.tar.gz (or results.tar.bz2) into the directory new_results.
It is possible to proceed on two steps or in one step.
 
* Extract the compressed archive file on one step:
 
With gz:
 
<source lang="console">
[user_name@localhost]$ tar -xvzf results.tar.gz -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[MEDUSA@MEDUSA-PC Migration]$  tar -xzf results.tar.gz -C new_results/
[MEDUSA@MEDUSA-PC Migration]$  ls new_results/
results/
</source>
 
With bz2 extension:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ tar -xvjf results.tar.bz2 -C new_results/
results/
results/log1.dat
results/log5.dat
results/Res-01/
results/Res-01/log.15Feb16.1
results/Res-01/log.15Feb16.4
results/Res-02/
results/Res-02/log.15Feb16.balance.b.1
results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/
results/
</source>
'''Notes:'''
* In the previous example, it is possible to start with the option -C {the destination directory}, however first make sure that the destination directory exists, since tar is not going to create the directory for you and will fail if it does not exist. The command is:
 
<source lang="console">
[user_name@localhost]$ tar -C new_results/ -xzf results.tar.gz
</source>
 
<source lang="console">
[user_name@localhost]$ tar -C new_results/ -xvjf results.tar.bz2
</source>
 
'''Notes:'''
* If the option -C {destination directory} is not invoked, the files will be extracted in the same directory.
* The option v is used for verbosity. In this case, it displays the files and directories as they are extracted to the new directory.
* If we want to display more details (like the date, permission ...), we can add a second v option as follow:
  tar -C new_results/ -xvvzf results.tar.gz
  tar -C new_results/ -xvvjf results.tar.bz2
Extract the compressed archive file on two steps:
 
Here we use the same command as previously but without the z option or the j option. First we use gunzip or bunzip2 to uncompress the file. Then we use tar -xvf to un-tar the archive as follow:
 
Let suppose we have the compressed file: results.tar.bz2
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.bz2  tests/  work/
[user_name@localhost]$ bunzip2 results.tar.bz2
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/  Res-02/
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/  Res-02/
</source>
For the gz file:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar.gz  tests/  work/
[user_name@localhost]$ gunzip results.tar.gz
[user_name@localhost]$ tar -C ./new_results/ -xvvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat  Res-01/  Res-02/
</source>
 
=== How to restore or extract one file from an archive or a compressed archive file in another directory? ===
 
Let us consider again the same example as previously. First we create the archive '''results.tar''' for the directory archive and list all the files in it. Then we will extract one file into the directory new_results:
 
<source lang="console">
[user_name@localhost]$ ls
bin/  documents/  jobs/  new.log.dat  new_results/  programs/  report/  results/  results.tar  tests/  work/
</source>
 
<source lang="console">
[user_name@localhost]$ tar -tvf results.tar
drwxrwxr-x name name        0 2016-11-20 11:02 results/
-rw-r--r-- name name    10905 2016-11-16 16:31 results/log1.dat
-rw-r--r-- name name    10909 2016-11-16 16:31 results/log5.dat
drwxrwxr-x name name        0 2016-11-20 15:16 results/Res-01/
-rw-r--r-- name name    11682 2016-11-16 15:10 results/Res-01/log.15Feb16.4
drwxrwxr-x name name        0 2016-11-16 19:37 results/Res-02/
-rw-r--r-- name name    34111 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.1
-rw-r--r-- name name    34117 2016-11-16 15:10 results/Res-02/log.15Feb16.balance.b.4 
</source>
 
<source lang="console">
[user_name@localhost]$ ls new_results/                                                                       
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
</source>
In this example, we have extracted the file results/Res-01/log.15Feb16.4 from the archive without uncompressing the whole archive by using the option --extract. The command creates the same directories as in the archive but in the destination directory.
'''Notes:'''
* It is mandatory to use the -C {destination directory} for this command otherwise the command will extract the file to the same directory as the archive created for if it exists. If not, the command will create the same directory.
* It works to extract a file or a directory but we need to give the right path for the file or directory.
* The same command can be used to extract multiple files y adding the full path as in the previous example.
 
<source lang="console">
[user_name@localhost]$ tar -C ./new_results/ --extract --file=results.tar "results/Res-01/log.15Feb16.4" "file2" "file3"
</source>
The same command can also be used to extract a file from a compressed tar file.
From a gz file:
 
<source lang="console">
[user_name@localhost]$ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
</source>
From a bz2 file:
 
<source lang="console">
[user_name@localhost]$ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
[user_name@localhost]$ ls new_results/results/Res-01/log.15Feb16.4
new_results/results/Res-01/log.15Feb16.4
</source>
=== How to extract multiple files using wildcards? ===
 
<source lang="console">
[user_name@localhost]$ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
[user_name@localhost]$ ls new_results/results/
log1.dat  log5.dat
</source>
With the above command, we have extracted the files that are in the directory /results and with the extension .dat
 
'''Note:''' The command is also valid with invoking j or z options for compressed archives as we have seen previously.
 
From our previous example, we can extract all the files that start by log :
=== How to preserve symbolic links using tar command? ===
 
If you have symbolic links in your directory and you want to preserve them, add the option h:
 
<source lang="console">
[user_name@localhost]$ tar -cvhf results.tar results/
</source>
=== How to add files to compressed archives (tar.gz/tar.bz2)? ===
We have already mentioned that we cannot add files to compressed archives. To do so, we need to uncompress the files using gunzip or bunzip2 as seen previously. We obtain the tar file and we add the file or the files to this archive by invoking the r option. The we can compress again using gzip or bzip2.
 
=== How to check the size of a file, directory or archive? ===
 
From your terminal, you can use the command du -sh [your_file ...] to see the size:
 
<source lang="console">
[user_name@localhost]$ du -sh results.tar work tests
112K results.tar
58K  work
48K  tests
</source>
 
By knowing the size of your files or directories, you can decide how to split them of different archives. It is also possible to split an archive file into small files using the split command. A big tar file can also be divided into small parts.
 
Syntax :  split -b <Size-in-MB> <tar-file-name>.<extension>  “prefix-name”
split -b 100MB results.tar small-res
 
The option b is invoked to fix the size of the small parts and prefix-name is the name for the small files.
 
The above command will split the file results.tar into smaller files and the size of each one of them is 100 MB in current working directory and split file names will starts from: small-resaa  small-resab  small-resac small-resad  ....
To recover the original file, we use the cat command as follow:
 
cat small_res* >your_archive_name.tar
 
Using split command you can divide your large files into smaller part by invoking split with the size you want {-b size in MB} then transfer all the small parts. Once all the small parts are transferred, use the cat command to recover your file or your archive.
In case if you want to append numbers in place of alphabets then use ‘-d’ option in above split command.
== Reminder of the most used commands ==
 
* Use pwd {present work directory} to see the current working path.
 
* Use ls {list} command to see the files and the sub-directories.
 
* Use du -sh {disk usage} to see the size of the files, directories and sub-directories.
 
* For gzip, bzip2, they also use some free space to create the final archive but in this case the new file you get is your_file.gz if you use gzip or your_file.bz2 if you use bzip2; if it is a tar file; you will get the new file your_archive.tar.gz  or your_archive.tar.bz2
The tar command can be applied to multiple files or directories in order to put them together into a final one file archive.
tar  a directory:
 
* $ tar -cvf results.tar results
 
* tar more than one directory: $ tar -cvf full_results.tar results report documents
 
* tar for example all the files or directories that start with a given a letter, "r" for example: $ tar -cvf archive.tar r*
 
* List the content of a tar file including the details: $ tar -tvf results.tar
* List the content of a tar file without details: $ tar -tf results.tar
 
* Count the number of entries in the tar file: $ tar -tvf results.tar | wc -l or $ tar -tf results.tar | wc -l
 
* Search for a given file in the tar archive file without un-tarring the archive: $ tar -tf results.tar | grep -a file_name_you_search
or $ tar -tvf results.tar | grep -a file_name_you_search
 
* List only file ending or starting by something; for examples files starting with log: $ tar -tf results.tar | grep -a log*
Or: tar -tvf results.tar | grep -a log*
 
* How to append a file (for example new.log.dat) or files or add a new file to the end of a tar file ressults.tar): $ tar -rf results.tar new.log.dat
 
'''Note:''' Files cannot be added to compressed archives (gz or bzip2). Files can only be added to plain tar archives.
 
* Add a directory to a tar file: $ tar -rf results.tar report/
 
* Add one archive to another with concatenate: $ tar -A -f results.tar report.tar
 
* Extract the whole archive file: $ tar -xvf results.tar -C new_results/
 
* Compress a file (or files), or a tar archive:
 
Using gzip: $ gzip new.log.dat; $ gzip results.tar
Using bzip2: $ bzip2 new.log.dat; $ bzip2 results.tar
 
* Compress with "z" or "j" option for gzip or bzip respectively: $ tar -cvzf results.tar.gz results/; $ tar -cvjf results.tar.bz2 results/
or $ tar -cvzf results.tgz results/; $ tar -cvjf results.tbz results/
 
* Exclude particular files or type while creating tar file: $ tar -cvf results.tar results/ --exclude=*.dat
 
* Uncompress gz files or bz2: $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}; $ tar –xvf {tar-file } {file-to-be-extracted } -C {path-where-to-extract}. For files with .gz extension, we use gunzip as follow: $ gunzip new.log.dat.gz; $ gunzip results.tar.gz; For files with .bz2 extension, we use bunzip2 as follow: $ bunzip2 new.log.dat.bz2; $ bunzip2 results.tar.bz2
 
* List the content of a compressed file (*.gz or *.bz2): For gz file: $ tar -tvzf results.tar.gz; For bz2 file: $ tar -tvjf results.tar.bz2
 
'''Notes:''' Again, in this example the option v is used to display all detail but not required. The two previous commands can be also combines with the pipe and wc or pipe and grep as we have seen previously.
 
* Extract a compressed archive file in another directory: With gz: $ tar -xvzf results.tar.gz -C new_results/ or $ tar -C new_results/ -xvzf results.tar.gz; With bz2 extension: $ tar -xvjf results.tar.bz2 -C new_results/ or $ tar -C new_results/ -xvjf results.tar.bz2
* Extract the compressed archive file on two steps: For the bz2 file: $ bunzip2 results.tar.bz2; $ tar -C ./new_results/ -xvvf results.tar; For the gz file: $ gunzip results.tar.gz; $ tar -C ./new_results/ -xvvf results.tar
 
* Extract one file from an archive or a compressed archive file in another directory: $ tar -C ./new_results/ --extract --file=results.tar results/Res-01/log.15Feb16.4; $ tar -C ./new_results/ --extract --file=results.tar "file1" "file2" "file3"
 
* The same command can also be used to extract a file from a compressed tar file. From a gz file: $ tar -C ./new_results/ --extract -z --file=results.tar.gz results/Res-01/log.15Feb16.4; From a bz2 file: $ tar -C ./new_results/ --extract -j --file=results.tar.bz2 results/Res-01/log.15Feb16.4
 
* Extract multiple files using wildcards: $ tar -C ./new_results/ -xvf results.tar --wildcards "results/*.dat"
 
* To preserve symbolic links using tar command: $ tar -cvhf results.tar results/
 
* Add files to compressed archives (tar.gz/tar.bz2): Uncompress the archive; Add the file; Compress again.
 
* Determine the size of the files: $ du -sh results.tar work tests
* Split a file or a tar file: $ split -b <Size-in-MB> <tar-file-name>.<extension>  “prefix-name”; $ split -b 100MB results.tar small-res
Retrieve the original file using cat: $ cat small_res* >your_archive_name.tar

Latest revision as of 19:24, 18 July 2019