Zfs

From Alliance Doc
Jump to navigation Jump to search
This site replaces the former Compute Canada documentation site, and is now being managed by the Digital Research Alliance of Canada.

Ce site remplace l'ancien site de documentation de Calcul Canada et est maintenant géré par l'Alliance de recherche numérique du Canada.


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




ZFS is a combined file system and logical volume manager designed by Sun Microsystems. ZFS can scale to very large file system sizes and supports compression.

ZFS greatly simplifies the process of increasing a filesystem size as required. The simpliest approach is to add new volumes to your VM and then add them to your ZFS filesystem to grow the size of your file system. This can be done while the filesystem is live and file IO is occuring on the filesystem.

Installing ZFS

Starting with the image Ubuntu-18.04-Bionic-x64-2018-09

Ensure your package list is up-to-date and also do upgrades of your installed packages. While it isn't strictly nessacary to upgrade your installed packages it is a good idea.

[name@server]$ sudo apt-get update 
[name@server]$ sudo apt-get dist-upgrade -y

Next install ZFS.

[name@server]$ sudo apt-get install zfsutils-linux

Starting with the image CentOS-7-x64-2018-09

[name@server]$ sudo yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm
...
Total size: 2.9 k
Installed size: 2.9 k
Is this ok [y/d/N]: y
...

hmm... this is looking strangely more complicated, see for example [1]

Starting with the image Fedora-Cloud-Base-29-1.2

to be written!

Using ZFS

Creating a zpool

[name@server]$ sudo zpool create -f data /dev/vdb /dev/vdc

This will create a new mount point at /data backed by the volumes attached at /dev/vdb and /dev/vdc. The filesystem will have a size slightly smaller than the combined sizes of all attached volumes. As data is written to the /data mount point it will be dynamically stripped across all drives. This is good for permformance and allows you to make use of all your storage, however it doesn't provide any additional data replication (see here for information about creating mirrored zpools).

ZFS can compress data as it is written to the file system and uncompress it when it is read. To turn on and choose a compression algorithim for a zpool use the following command.

[name@server]$ sudo zfs set compression=lz4 data

This will use the lz4 compression algorithm on the zpool data. If your data is largely binary you might not see large reduction in storage use, however, if your data is something more compressible, such as ascii data, you may see a larger reduction in storage use. Using compression can also improve file IO performance because less data needs to read and written. However this can depend on the particular compression algorithm chosen and some particularly computationally intensive algrothims may actually reduce file IO performance. The lz4 algorithm was choosen because it is a resonable compromise between speed and amount of compression achieved, other compression algorithms may provide better compression or speed but likely not both.

To see a list of peroperties for your pool use the below command.

[name@server]$ sudo zfs get all data
NAME  PROPERTY              VALUE                  SOURCE
data  type                  filesystem             -
data  creation              Fri Mar  1 17:57 2019  -
data  used                  82.5K                  -
data  available             19.3G                  -
data  referenced            24K                    -
data  compressratio         1.00x                  -
data  mounted               yes                    -
data  quota                 none                   default
...

zpools can be further subdived into datasets. A dataset allows you to operate on smaller portions of your zpool independently. For example setting a different compression algorithims for the dataset. To create a dataset within the data zpool do the following.

[name@server]$ sudo zfs create data/www

Growing a zpool

Growing a zpool is a relatively simple task when compared to other file systems and logical volume mangers. It is a two step process 1) add a new volume to your VM and 2) add the new device to your zpool that's it. Below is the command to add a new device to the zpool data.

[name@server]$ sudo zpool add data /dev/vde

The zpool will now have access to the added storage contributed by the newly added device. This process usually takes less than a minute even for very large several TB sized zpools and volumes. Check pool status

[name@server]$ sudo zpool status data

Destroying a zpool

[name@server]$ sudo zpool destroy data

Notes

  • While in theory it should be possible to use ZFS with resizing volumes in OpenStack in practices this has not been straight forward and is better to be avoided if possible.
  • While there is no hard limit to how many volumes you can attach to your VM it is best to keep the number of volumes attached to a reasonable number. 19 attached volumes has been tested and shown to work well with the Queens version of OpenStack.
  • Having a pool of two or more volumes may provide improved IO performance.

See also