Who Needs Git When You Got ZFS?

This post is also available in Japanese.

I've been playing a little bit with ZFS, Oracle's (previously Sun's) next-generation file system. Originally developed for Solaris, but since it's open source also ported to Linux (as of 0.6.1considered stable for production use) and Mac. While called a file system, ZFS is also a volume manager, so also takes over the job of partitioning your disk as well. Why is ZFS cool? It includes protection against data corruption, built-in support for RAID, snapshots and copy-on-write clones, and flexible and efficient ways of transferring data, e.g. for backups. To show what's possible and push the limits somewhat, I'll show how we get implement various features of Git, the version control system (or any version control system, for that matter) using ZFS. Of course, I'm not seriously suggesting you'd ditch a "proper" version control system, but it gives a good sense of what's possible at the file system level.

Installing ZFS is not hard: on Mac go to the OpenZFS On OS X site and install the package. On Ubuntu Linux:

$ sudo apt-add-repository ppa:zfs-native/stable
$ sudo apt-get update
$ sudo apt-get install ubuntu-zfs

Pools and file systems

Now you're able to create new ZFS storage pools and file systems. If you have a drive available you can use that, or, if you don't and just want to play around a little bit, you can create one or more files to represent the disks. For instance, to create a 10G file you can usedd:

$ dd if=/dev/zero of=/tmp/disk1.img bs=1024 count=10485760

If you want to test out a RAID setup, create a second one with a different name than disk1.img. The next step is to create a storage pool, for this we'll usezpool create If you have one or more disks available you can use their drive label (e.g. /dev/sda or /dev/sdb) or better yet: by id (/dev/disk/by-id/...), in our case we'll use absolute paths to our regular files.

We can create various types of pools, for instance to create a mirror raid:

$ sudo zpool create mypool mirror /tmp/disk1.img /tmp/disk2.img

This will create a pool named "mypool" that mirrors across the two "devices" and mount it under/mypool(on Linux, or/Volumes/mypoolon Mac). To see how much space we have available use zfs list:

$ sudo zfs list
mypool      433Ki  9,78Gi   370Ki  /Volumes/mypool

Alternatively, we can pool up the space from all devices and treat it as one big drive. If you created mypool already, destroy it first:

$sudo zpool destroy mypool

Then, to create the non-mirrored pool:

$ sudo zpool create mypool /tmp/disk1.img /tmp/disk2.img
$ sudo zfs list
mypool      439Ki  19,6Gi   370Ki  /Volumes/mypool

Now we have a total of about 20G available.

There's much more you can do with storage pools, like adding disks on the fly, replacing them on the fly etc. But let's stick to this simple setup for now.

While we can now start writing files to the/Volumes/mypoolor/mypoolmount, this is not the recommended way of using ZFS. Instead, we will create separate file systems in the pool. For each of these file systems we can then set various properties, such as whether to enable encryption, compression or quotas. We can also take snapshots of each file system individually, or share the file systems via Samba or NFS, or transfer file system snapshots to other pools, possibly on other servers.

So file systems are kind of the shit.

ZFS filesystems are managed using thezfscommand line tool (as opposed to zpool used for pools).

$ sudo zfs create mypool/test

This will create and mount a new filesystem under/mypool/test(or/Volumes/mypool/teston Mac). Incidentally, we can mount file systems (and pools) anywhere we like by passing in the-mswitch, or, even more fun: by changing the mountpoint on the fly:

$ sudo zfs set mountpoint=/test mypool/test

which remounts the filesystem under/test. To see all properties of the filesystem, use zfs get all:

$ sudo zfs get all mypool/test
NAME         PROPERTY              VALUE                 SOURCE
mypool/test  type                  filesystem            -
mypool/test  creation              di aug 20 14:47 2013  -
mypool/test  used                  442Ki                 -
mypool/test  available             9,78Gi                -
mypool/test  referenced            442Ki                 -
mypool/test  compressratio         1.00x                 -
mypool/test  mounted               yes                   -
mypool/test  quota                 none                  default
mypool/test  reservation           none                  default
mypool/test  recordsize            128Ki                 default
mypool/test  mountpoint            /test                 local
mypool/test  checksum              on                    default
mypool/test  compression           off                   default
mypool/test  atime                 on                    default
mypool/test  devices               on                    default
mypool/test  exec                  on                    default
mypool/test  setuid                on                    default
mypool/test  readonly              off                   default
mypool/test  snapdir               hidden                default
mypool/test  canmount              on                    default
mypool/test  copies                1                     default
mypool/test  version               5                     -
mypool/test  utf8only              on                    -
mypool/test  normalization         formD                 -
mypool/test  casesensitivity       sensitive             -
mypool/test  refquota              none                  default
mypool/test  refreservation        none                  default
mypool/test  primarycache          all                   default
mypool/test  secondarycache        all                   default
mypool/test  usedbysnapshots       0                     -
mypool/test  usedbydataset         442Ki                 -
mypool/test  usedbychildren        0                     -
mypool/test  usedbyrefreservation  0                     -
mypool/test  logbias               latency               default
mypool/test  sync                  standard              default

There's a bunch of useful stuff here, for instance, let's enable compression:

$ sudo zfs set compression=on mypool/test

Anything we write to this filesystem from this point onwards will be compressed.

Who needs Git?

Using ZFS as a replacement of Git for is probably not a good idea, but just to give you a sense of what ZFS supports at the file system level, let me go through a few typical git-like operations:

Notably missing is support for merging, which ZFS does not have direct support for as far as I'm aware.

Creating a repository

First, let's create a filesystem for our projects, with a specific nested filesystem for our project, which we'll call "zfsgit". Ues, you can nest filesystems as deep as you like. And then we'llchownthe root of the filesystem to our current user so that we don't have tosudofor creating, editing and removing files.

$ sudo zfs create mypool/projects
$ sudo zfs create mypool/projects/zfsgit
$ sudo chown $(whoami) /Volumes/mypool/projects/zfsgit
$ cd /Volumes/mypool/projects/zfsgit

Alright, we now have the equivalent of a repository, or checkout thereof.

Let's create a file and put some content in it:

$ echo "Hello" > file.txt

"Committing" and "Tagging"

In order to create a "commit" or "tag", i.e. something that is kept in our project's history and you can revert to, you can use a ZFS snapshot. ZFS snapshots have to be explicitly named. Let's create our first one "firstcommit". We do this by adding @ and the snapshot name to our filesystem name.

$ sudo zfs snapshot mypool/projects/zfsgit@firstcommit

Now, let's change our file slightly:

$ echo "world" >> file.txt

Let's see what changed:

$ sudo zfs diffmypool/projects/zfsgit@firstcommit
M	/Volumes/mypool/projects/zfsgit/file.txt

Sadly it won't really get to see a textual diff, but at least it indicates which file changed. We can now create a new commit:

$ sudo zfs snapshot mypool/projects/zfsgit@secondcommit

To list our current snapshots:

$ sudo zfs list -t snapshot
NAME                                   USED   AVAIL   REFER  MOUNTPOINT
mypool/projects/zfsgit@firstcommit    146Ki       -   370Ki  -
mypool/projects/zfsgit@secondcommit       0       -   386Ki  -

Now, let's make another change:

$ echo "ladies..." >> file.txt

That was a bad idea, let's roll back to our previous snapshot:

$ sudo zfs rollback mypool/projects/zfsgit@secondcommit
$ cat file.txt

And now we got our previous version back.


Functionality similar to branching can be achieved usingzfs clone, which allows you to clone a filesystem based on a particular snapshot:

$ sudo zfs clone mypool/projects/zfsgit@firstcommit mypool/projects/zfsgit_branch

This creates a new copy-on-write filesystem, mounted undermypool/projects/zfsgit_branch which is a very light-weight operation because no copying is involved, and initially barely any extra diskspace is consumed.

Pushing and pulling repositories

You can send filesystems, even incrementally to other storage pools, both local and remote. To demonstrate, let's say we created another storage pool called "mypool2" locally. We can now "push" any snapshot to our the other storage pool as follows (as root):

$ zfs send mypool/projects/zfsgit@firstcommit | zfs receive mypool2/zfsgit

You can imagine, this works just as well via SSH, for instance:

$ zfs send mypool/projects/zfsgit@firstcommit | ssh root@myserver zfs receive mypool/zfsgit

This pushes the entire filesystem as it looked at the time of the snapshot. Alternatively, if we already pushed a previous snapshot before, we can also just push the difference between the previous snapshot and the current one using the-ioption:

$ zfs send -imypool/projects/zfsgit@firstcommitmypool/projects/zfsgit@secondcommit| zfs receive mypool2/zfsgit

This is useful for incrementally backing up large file systems. Of course, this is just using Unix pipes, so we can also write the result ofzfs sendto a file and upload it to S3, for instance:

$ zfs send mypool/projects/zfsgit@firstcommit> backup.dump

To pull a filesystem, instead of pushing it, you'd do the reverse, over SSH that could look something like this:

ssh root@myserver zfs send mypool/zfsgit@secondcommit | zfs receive mypool/zfsgit

Should you use ZFS?

ZFS is pretty cool and pretty stable, at least on Solaris and Linux. I'm not sure of the stability on Mac at this time. Using ZFS as a root file system on Linux is still slightly problematic at this moment, but those issues will likely be resolved soon. I don't have extensive experience with its reliability and performance myself, but the Internets has good things to say.

However, ZFS is not the only game in town. There's also Linux' Btrfs, which offers many similar features. However, Btrfs is newer and less mature, it may not be as stable yet. Either way, these file systems are a lot of fun to play with. To learn more about ZFS, I'd recommend reading through Oracle's ZFS Administration Guide, which is pretty readable and much of it applies to Linux and Mac as well.

