Dar

From Alliance Doc
Revision as of 21:41, 10 May 2019 by Razoumov (talk | contribs)
Jump to navigation Jump to search

Parent page: Storage and file management

The dar (stands for Disk ARchiver) utility was written from the ground up as a modern replacement to the classical Unix tar tool. First released in 2002, dar is open source, is actively maintained, and can be compiled on any Unix-like system.

Similar to tar, dar supports full / differential / incremental backups. Unlike tar, each dar arhive includes a file index for fast file access and restore -- this is especially useful for large archives! dar has built-in compression on a file-by-file basis, making it more resilient against data corruption, and you can optionally tell it not to compress already highly compressed files such as mp4 and gz. dar supports strong encryption, can split archives at 1-byte resolution, supports extended file attributes, sparse files, hard and symbolic (soft) links, can detect data corruption in both headers and saved data and recover with minimal data loss, and has many other desirable features. On the dar page you can find a detailed feature-by-feature tar-to-dar comparison.

Where to find dar

Since dar can be compiled on the command-line, you can install it easily on Linux and MacOS. On Compute Canada clusters a slightly out-of-date version can be found in /cvmfs:

[user_name@localhost]$ which dar
/cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/dar
[user_name@localhost]$ dar --version
dar version 2.5.3, Copyright (C) 2002-2052 Denis Corbin
...

If you want a newer version, you can compile it from source (replace 2.6.3 with the latest version number):

[user_name@localhost]$ wget https://sourceforge.net/projects/dar/files/dar/2.6.3/dar-2.6.3.tar.gz
[user_name@localhost]$ tar xvfz dar-*.gz && /bin/rm -f dar-*.gz
[user_name@localhost]$ cd dar-*
[user_name@localhost]$ ./configure --prefix=$HOME/dar --disable-shared
[user_name@localhost]$ make
[user_name@localhost]$ make install-strip
[user_name@localhost]$ $HOME/dar/bin/dar --version

Using dar manually

Basic archiving and extracting

Let's say, in the current directory you have a subdirectory test. To pack it into an archive, you can type in the current directory:

[user_name@localhost]$ dar -w -c all -g test

This will create an archive file all.1.dar, where all is the base name and 1 is the slice number. You can break a single archive into multiple slices (below). You can include multiple directories and files into an archive, e.g.

[user_name@localhost]$ dar -w -c all -g testDir1 -g testDir2 -g file1 -f file 2

Please note that all paths should be relative to the current directory.

To list the archive's contents, use only the base name:

[user_name@localhost]$ dar -l all

To extract a single file into a subdirectory restore, use the base name and the file path:

[user_name@localhost]$ dar -R restore/ -O -w -x all -v -g test/filename

The flag -O will tell dar to ignore file ownership. Wrong ownership would be a problem if you are restoring someone else's files and you are not root. However, even if you are restoring your own files, dar will throw a message that you are doing this as non-root and will ask you to confirm. To disable this warning, use -O. The flag -w will disable a warning if restore/test already exists.

To extract an entire directory, type:

[user_name@localhost]$ dar -R restore/ -O -w -x all -v -g test

Similar to creating an archive, you can pass multiple directories and files by using multiple -g flags. Note that dar does not accept Unix wild masks after -g.

Incremental backups

You can create differential and incremental backups with dar, by passing the base name of the reference archive with -A. For example, let's say on Monday you create a full backup named monday:

[user_name@localhost]$ dar -w -c monday -g test

On Tuesday you modify some of the files and then include only these files into a new, incremental backup named tuesday, using monday archive as a reference:

[user_name@localhost]$ dar -w -A monday -c tuesday -g test

On Wednesday you modify more files, and at the end of the day you create a new backup named wednesday, now using tuesday archive as a reference:

[user_name@localhost]$ dar -w -A tuesday -c wednesday -g test

Now you have three files:

[user_name@localhost]$ ls *.dar
monday.1.dar     tuesday.1.dar    wednesday.1.dar

The file wednesday.1.dar contains only the files that you modified on Wednesday, but not the files from Monday or Tuesday. Therefore, the command

[user_name@localhost]$ dar -R restore -O -x wednesday

will only restore files that were modified on Wednesday. To restore everything, you have to go through all backups in the chronological order:

[user_name@localhost]$ dar -R restore -O -w -x monday      # restore the full backup
[user_name@localhost]$ dar -R restore -O -w -x tuesday     # restore the first incremental backup
[user_name@localhost]$ dar -R restore -O -w -x wednesday   # restore the second incremental backup

Limiting the size of each slice

To limit the maximum size of each slice in bytes, use the flag -s followed by a number and one of k/M/G/T. For example, for a 1340 MB archive, the command

[user_name@localhost]$ dar -s 100M -w -c monday -g test

will create 14 slices named monday.{1..14}.dar. To extract from all of these, use their base name:

[user_name@localhost]$ dar -O -x monday

Using dar via functions

Using dar would be much easier if you did not have to memorize and specify all the flags and the right syntax on the command line. Here we provide several bash functions for easy backup. Please note that these functions assume that you are below your quota (so you can write files!), have read and write permissions, i.e. all the common-sense assumptions. It is your job to ensure that this is the case, and that dar archived/restored your files correctly before you delete the originals. In other words, please test everything before including these functions into your workflow.

Limiting the number of files in each slice with multidar

Paste the following function into your shell, or save this definition into your $HOME/.bashrc file and then enable it with source ~/.bashrc:

function multidar() {
    if ! [ $# = 2 ]; then
	echo Usage: multidar sourceDirectory maxNumberOfFilesPerArchive
    else
	sourceDirectory=$1
	maxNumberOfFilesPerArchive=$2
	if which dar 2>/dev/null; then
	    echo great, I found dar at $(which dar)
	    find $sourceDirectory -type f > .fullList
	    sed -i -e '/DS_Store/d' .fullList
	    sed -i -e 's/\/\//\//' .fullList
	    split -a 3 -l $maxNumberOfFilesPerArchive .fullList .partial
	    for i in .partial*; do
		echo archiving from $i to ${sourceDirectory%?}-${i:8:3}
		dar -w -c ${sourceDirectory%?}-${i:8:3} --include-from-file $i
 		/bin/rm -rf $i
 	    done
 	    /bin/rm -rf .fullList*
	    ls -lh ${sourceDirectory%?}*.dar
	else
	    echo please install dar
	fi
    fi
}

Now, running the command without arguments will show you the syntax:

[user_name@localhost]$ multidar
Usage: multidar sourceDirectory maxNumberOfFilesPerArchive

Let's assume that we have 1000 files inside test. Running the command

[user_name@localhost]$ multidar test 300

will produce four archives, each with its own basename and no more than 300 files inside. To restore from these archives, use a bash loop:

[user_name@localhost]$ for f in test-aa{a..d}
                       do
                         dar -R restore/ -O -w -x $f
                       done

Backup

Let's define the following function:

function backup() {
    BREF='/home/username/tmp'
    BSRC='-g test'   # cannot use an absolute path
    BDEST=/home/username/tmp/backups
    BTAG=all
    FLAGS=(-s 5G -zbzip2 -asecu -w -X "*~" -X "*.o")   # bash array with some flags
    #FLAGS+=(-K aes:)   # add encryption
    if [ $# == 0 ]; then
	echo missing argument ... need to be one of: show 0 1 2 3 .. 98 99
    elif [ $1 == 'show' ]; then
	ls -lhtr $BDEST/"$BTAG"*
    elif [ $1 == '0' ]; then
	echo backing up $BSRC to $BDEST
	dar "${FLAGS[@]}" -c $BDEST/"$BTAG"0 -R $BREF $BSRC
	/bin/rm -rf $BDEST/"$BTAG"{1..100}.*.dar; ls -lhtr $BDEST/"$BTAG"*
    else
	level=$1
	if [ -n "$level" ] && [ "$level" -eq "$level" ] 2>/dev/null; then   # check if it is a number
	    echo backing up $BSRC to $BDEST
  	    dar "${FLAGS[@]}" -A $BDEST/"$BTAG"$((level-1)) -c $BDEST/"$BTAG"$level -R $BREF $BSRC
	    for i in $(seq $((level+1)) 100); do
		/bin/rm -rf $BDEST/"$BTAG"$i.*.dar
	    done
 	    ls -lhtr $BDEST/"$BTAG"*
	else
	    echo $level is not a number ...; return 1
	fi
    fi
}

You need to define the four variables at the top:

  • BREF stores the absolute path of the parent directory (containing all subdirectories and files to archive)
  • BSRC stores a relative (to BREF) list of subdirectories and files to archive; BSRC cannot be an absolute path
  • BDEST is the backup destination
  • BTAG will form the root of the backup basename

To create the full backup all0.*.dar, type

[user_name@localhost]$ backup 0

To create the first incremental backup all1.*.dar, type

[user_name@localhost]$ backup 1

To create the second incremental backup all2.*.dar, type

[user_name@localhost]$ backup 2

and so on. To see all backups, type

[user_name@localhost]$ backup show

If your backup exceeds 5 GB, more than one slice will be created.

If you have too many incremental backups, you can always create a lower-numbered backup, e.g.

[user_name@localhost]$ backup 1

will overwrite the first incremental backup and will remove all higher-numbered backups.

Restore from backup

Let's define the function

function restore() {
    BSRC=/home/username/tmp/backups
    BTAG=all
    BDEST=/home/username/tmp/restore
    if [ $# == 0 ]; then
	echo Examples:
	echo '   'restore -l anyPattern
	echo '   'restore -x Pictures/1995
	echo '   'restore -x Documents/notes
	echo '   'restore -x Documents/notes/quantum.txt
	echo '   'restore -n 0 Documents/misc/someFile.txt
	echo 'Notes: (1)' restore -x/-n does not understand Unix wildmasks, so need to specify full directory or file name
	echo '       (2)' always specify one name per command
	echo '       (3)' restore will put the restored files into \$BDEST
    elif [ $1 == '-l' ]; then
	echo Listing all versions
	for file in $BSRC/"$BTAG"{0..99}; do
	    if [ -f $file.1.dar ]; then
       		echo --- in $file:
		dar -l $file | grep $2
	    fi
	done
    elif [ $1 == '-x' ]; then
	echo Restoring from the earliest version:
	echo '  'important to go through all previous backups if restoring a directory or a sparsebundle
	echo '  'or if the most recent version of the file is stored in an earlier backup
	for file in $BSRC/"$BTAG"{0..99}; do
	    if [ -f $file.1.dar ]; then
       		echo --- from $file:
		dar -R $BDEST -O -w -x $file -v -g $2
	    fi
	done
    elif [ $1 == '-n' ]; then
	echo Be careful with restoring from a single layer: might not work as naively expected
	echo Restoring from version $2
	dar -R $BDEST -O -w -x $BSRC/"$BTAG"$2 -v -g $3
    else
	echo unrecognized option ...
    fi
}

Similar to the previous function, you need to define these variables:

  • BSRC is the backup directory
  • BTAG is the root of the backup basename
  • BDEST is the directory into which you are restoring

Search for a file test999 inside your backups with:

[user_name@localhost]$ restore -l test999

This will scan both the full backup and all incremental backups. To extract this file, you can specify the backup number and the full path of the file as it appears in the archive, e.g.

[user_name@localhost]$ restore -n 2 test/test999

However, this will not necessarily restore the file. This command will only restore the file if it was modified between backups 1 and 2 and therefore included into backup 2. To restore the file for sure, you have two options: either restore from the full backup and then from all incremental backups in the chronological order:

[user_name@localhost]$ restore -n 0 test/test999
[user_name@localhost]$ restore -n 1 test/test999
[user_name@localhost]$ restore -n 2 test/test999
...

or use the -x flag:

[user_name@localhost]$ restore -x test/test999

This last command will automatically go through all backups in the right order. To restore the entire directory, simply type:

[user_name@localhost]$ restore -x test

Note that restore does not accept Unix wild masks.

Symmetric encryption

To encrypt your backup, uncomment the line

#FLAGS+=(-K aes:)   # add encryption

in backup() function. Then dar will ask for a separate password (and confirmation) for each new backup, and the password for the reference (old) backup. When restoring, you will have to provide the password for each backup.