The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

ScottE's Blog

Did you know you can create your own Linux AWS EC2 AMI which is running 100% ZFS for all filesystems (/, /boot - everything)? You can, and its not too hard as long as you are experienced with installing Linux without an installer. Heres the rough instructions for setting this up with a modern Debian based system (Ive tested with Debian and Ubuntu). As far as I know, this is the first published account of how to set this up. There arent any prebuilt AMIs available that I know of, but I might just do that unless someone else beats me to it.

Why run ZFS for the root filesystem? Not only is ZFS a high performing filesystem, but using native ZFS for everything makes storage management a cinch. For example, want to keep your root EBS volumes small? No problem - keep your AMI on a 1GB volume (yes, its possible to be that small), and extend the ZFS pool dynamically at runtime by attaching additional EBS volumes as needed. ZFS handles this extremely well.

Why build your own AMI instead of using a prebuilt one? Theres a couple of good reasons, but the primary one is that you get a minimal AMI with the least bit of cruft and bloat possible. Many of the prebuilt cloud AMIs have a bunch of package installed that you might not need or want. By building from scratch, your AMI contains just the things you want, not only lowering EBS costs, but potentially reducing security risks.

Note that we dont do anything special with ephemeral drives here - thats best kept in its own ZFS pool anyway, since mixing EBS and ephemeral drives will have some interesting performance consequences. You can use ephemerals on an instance, of course (in fact, it works great to stripe across all ephemeral drives, or you could use SSD ephemerals for ZFS L2ARC) - thats just not the purpose of this article.

These instructions will only create an AMI that will boot on an HVM instance type. Although its easy enough to create a snapshot that can be registered separately as both HVM or PV AMI, all new AWS instance types support HVM. Because of this, Ive decided only to support newer instance types, hence HVM-only.

Ive tested this with the upcoming Debian Stretch (testing as of this writing), as well as Ubuntu Yakkety. It should work with Ubuntu Xenial as well, but I wouldnt try anything earlier, since ZFS support is relatively new and maturing rapidly (last time I tried Debian Jessie with 100% ZFS I found that grub was too old to support booting into ZFS, although a separate EXT4 /boot works fine. This may have changed since then).

Again, these instructions assume you are pretty familiar with installing Debian via debootstrap, which means manually provisioning volumes, partitioning them, creating filesystems, bootstrapping, and chrooting in for final setup. If you dont know what all these things mean, you might find this a difficult undertaking. Unlike installing to your own hardware, theres very little instrumentation if things go wrong, and only a read-only console (if you are lucky - if networking does not initialize properly, you might not even get that). Expect this to take a few iterations and some frustration - this is a general guide, not step-by-step instructions!

If youve ever installed a Debian based system from scratch, youll note that most of these steps are no different than youd do on physical hardware. There are only a few things that are AWS specific, but the vast majority is exactly how youd install on bare metal.

Step 1 - Prepare Host Instance

Fire up a host instance to build out the AMI. This doesnt need to be the same distribution or version as the AMI to be built, but it has to be recent enough to have ZFS. Debian Jessie (with jessie-backports) or Ubuntu Xenial will work.

Well use this instance (the host) to build out the target AMI, and if things dont go well we can come back to it and try again (so dont terminate it until you are ready, or have a working target AMI).

Once the host instance is up, provision a GP2 EBS volume via the AWS console and attach it to the host. We use a 10GB volume, but you could make this as small as 1GB if you really want to (be aware GP2 doesnt perform well with small volumes).

Well assume the newly provisioned volume is attached at /dev/xvdf. The actual device might vary, use dmesg if you arent sure.

Next, update /etc/apt/sources.list with the full sources list for your host distribution. For Debian, use main contrib non-free - and youll need jessie-backports if the host is Jessie. For Ubuntu, use main restricted universe multiverse.

Next, install ZFS and debootstrap:

$ apt update
$ apt install zfs-zed zfsutils-linux zfs-initramfs zfs-dkms debootstrap gdisk

Step 2 - Prepare Target Pools And Filesystems

Now its time to set up ZFS on the new EBS volume. Assuming the target volume device is /dev/xvdf, well create a GPT partition table with a small GRUB EFI partition and leave the rest of the disk for ZFS.

Be careful - many instructions out on the net for ZFS munge the sector geometry, or fake sgdisk into using an unnatural alignment. The following is correct (per AWS documentation) not only for EBS, but is the exact same geometry I use when installing Linux with ZFS root on physical hardware.

$ sgdisk -Zg -n1:0:4095 -t1:EF02 -c1:GRUB -n2:0:0 -t2:BF01 -c2:ZFS /dev/xvdf

This will create a small partition labelled GRUB (type EF02, 4096 sectors), and use the rest of the disk with a partition labelled ZFS (type BF01). The grub partition doesnt technically need to be as big as 4096 sectors, but this insures everything is aligned properly.

Its worth noting that I never give ZFS a full disk, and instead I always use partitions for ZFS pools. If you give ZFS the entire disk, it will create its own partition table, but waste 8MB in a Solaris partition that Linux has no use for.

OK, great, next up lets create our ZFS pool and set up some filesystems. This will set the target up in /mnt. You can choose any mount point you want, just remember to use it consistently if you choose a different one.

I use the ZFS pool name rpool, but you can choose a different one, just be careful to substitute yours everywhere.

You may want different options - this will globally enable lz4 compression and disable atime for the pool. You may want to disable compression generally and only enable it for specific filesystems. The choice is up to you. We also allow overlay mount on /var. This is an obscure but important bit - when the system initially boots, it will log to /var/log before the /var ZFS filesystem is mounted. Because the mount point is dirty, ZFS wont mount /var without setting the overlay flag. Note that /dev/xvdf2 is the second GPT partition we created above.

$ zpool create -o ashift=12 -O compression=lz4 -O atime=off -m none -R /mnt rpool /dev/xvdf2
$ zfs create -o mountpoint=/ rpool/root
$ zfs create -o mountpoint=/home rpool/home
$ zfs create -o mountpoint=/tmp rpool/tmp
$ zfs create -o overlay=on -o mountpoint=/var rpool/var

You may wish to have different ZFS filesystems, of course. And note we dont set any quotas - we let all our filesystems share the entire storage pool.

At this point, the usual storage commands should show everything mounted up and ready for bootstrap (zpool status, zfs list, df, etc).

Step 3 - Bootstrap The Target

Now well install our target distribution on the newly provisioned volume. Theres not much to do in this step:

$ debootstrap --arch amd64 stretch /mnt

Or if for Ubuntu Yakkety:

$ debootstrap --arch amd64 yakkety /mnt

Note that we can do this cross-distribution. We can bootstrap Ubuntu from a Debian host, or a Debian target from an Ubuntu host.

Step 4 - Chroot Into Target

Next up we need to chroot into the target before doing final configuration.

$ mount --rbind /dev /mnt/dev
$ mount --rbind /proc /mnt/proc
$ mount --rbind /sys /mnt/sys
$ chroot /mnt

At this point, you should have a root shell into the target system.

Step 5 - Finalize Target Configuration

Now well do some final configuration. Some of the steps here are different between Debian and Ubuntu, but the general theme is the same.

Update /etc/apt/sources.list with the full sources list for your target distribution. For Debian, use main contrib non-free. For Ubuntu, use main restricted universe multiverse. Be sure you are setting up sources.list for your target distribution, not the host like we did before!

Install packages, but be sure NOT to install grub when it asks - youll have to acknowledge that this will result in a broken system (for now, anyway).

$ apt update

# Debian
$ apt install linux-image-amd64 linux-headers-amd64 grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk locales
$ ln -s /proc/mounts /etc/mtab

# Ubuntu
$ apt install linux-image-generic linux-headers-generic grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk

# All
$ dpkg-reconfigure locales # Choose en_US.UTF-8 or as appropriate
$ apt install --no-install-recommends openssh-server

Note creating the symlink to /etc/mtab for Debian - There was a bug in ZFS that relied on using /etc/mtab. We got that bug fixed in Ubuntu by Canonical, but as of a couple of months ago, Stretch didnt yet have the fix - its probably fixed in Debian as well by now.

On Debian, I found I needed to modify GRUB_CMDLINE_LINUX in /etc/default/grub with the following. Note escaping $:

GRUB_CMDLINE_LINUX="boot=zfs \$bootfs"

This additional step might go away (or already be resolved) with a newer version of ZFS and grub in stretch. You could (should) probably add this to the grub.d configuration we add later, rather than here.

Verify grub and ZFS are happy. This is very important. If this step doesnt work, theres no point in continuing - the target will not boot.

This verifies that grub is able to probe filesystems and devices and has ZFS support. If this returns an error, the target system isnt going to boot.

Everything is good, so lets install grub:

Note we give grub the entire EBS volume of xvdf, not just xvdf1. This is important (installing to just the GRUB partition will result in a non-booting system).

Again, if this fails, youll need to diagnose why and potentially start over, as you wont have a bootable target system.

Now we need to add a configuration file for grub to set a few things. To do this, create a file in /etc/default/grub.d/50-aws-settings.cfg:

GRUB_RECORDFAIL_TIMEOUT=0
GRUB_TIMEOUT=0
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 ip=dhcp tsc=reliable net.ifnames=0"
GRUB_TERMINAL=console

This will configure grub to log as much as possible to the AWS console, get an IP address as early as possible, and force TSC (time source) to be reliable (an obscure boot parameter required for some AWS instance classes). net.ifnames is set so ethernet adapters are enumerated as ethX instead of ensXX.

Now, lets update grub:

You might want to check /boot/grub/grub.cfg at this stage to see if the zfs module will be probed and its got the right boot line (vague advice, I know).

Finally, set the ZFS cache and reconfigure - these might be unnecessary, but since this works, I superstitiously dont skip it :-).

$ zpool set cachefile=/etc/zfs/zpool.cache rpool
$ dpkg-reconfigure zfs-dkms

Now, just a few sundry things left to do.

Update /etc/network/interfaces with:

auto eth0
iface eth0 inet dhcp

Again note that weve altered the boot commandline so network devices will be enumerated as ethX, instead of ensXX.

Dont drop this config into /etc/network/interfaces.d/eth0.cfg - cloud-init will blacklist that configuration.

Finally, you may wish to provision and configure a user (cloud-init will set up a debian or ubuntu user already by default). You may want to give root user a secure passwd and update /etc/ssh/sshd_config to allow PermitRootLogin if this is appropriate for your environment and security policies.

Step 6 - Quiesce Target Volume

Before creating an AMI, we need to exit the chroot, unmount everything, export the pool - basically quiesce the target so the volume can be snapshot.

Exit the chroot:

Now, you should be back in the host instance.

Unmount the bind mounts (we use the lazy option, otherwise unmounts can fail):

$ umount -l /mnt/dev
$ umount -l /mnt/proc
$ umount -l /mnt/sys

And finally, export the ZFS pool.

Now, zpool status, df, etc should show that our target filesystems are unmounted, and /dev/xvdf is free to be safely cloned. If anything here fails (unmounting, exporting), the target will not be in a good state and wont boot.

Step 7 - Snapshot EBS And Create AMI

Now we are all set to create an AMI from our target EBS volume.

In the AWS console, take a snapshot of the target EBS volume - this should take a minute or two.

Next, also in the AWS console, select the snapshot and register a new AMI. Be sure to register as HVM and set up ephemeral mappings as you wish. Dont mess with kernel ID and other parameters.

Step 8 - Launch And Add Storage

Once registered, launch your shiny new AMI on an instance and enjoy ZFS root filesystem goodness.

If your instance never comes up, take a look at the console logging available in the AWS console. This is the only real avenue you have to debug a failed launch, and its very limited. If grub fails, the log might be empty. If networking fails, the log should have some details, but the instance will not be reachable.

A very useful debugging technique for AMIs is to terminate the instance, but dont destroy the EBS volume - instead, attach the volume to another instance and import the ZFS pool there. This will allow you to look at logs so hopefully you can figure out why the boot failed.

If the instance doesnt come up, you can re-import the ZFS pool on the host used to stage the target and try to fix it (remember above, I suggested leaving the host and target EBS volume around so you can iterate on it). Do the bind mounts before your chroot, and dont forget to unmount everything and export the pool before taking another snapshot.

Login with the debian or ubuntu users (with the default passwords), if provisioned by default cloud-init - or however they are provisioned by cloud-init if you customize it. Or login as root if you set the root passwd and modified ssh configuration to allow root login.

Did it work? If so great! If not, give it another try, paying careful attention to any errors, as well as scouring output of dkms builds, etc. This isnt completely straightforward, and it took me a few tries to get things figured out.

Now, lets show the power of ZFS by adding 100GB, which will be available across the entire rpool, without having to fracture filesystems, mount new storage to its own directory, or move files around to the new device.

Assuming we used a 10GB EBS volume for the AMI, our pool probably looks something like:

$ zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   9.94G   784M  9.22G        -     0%     0%  1.00x  ONLINE  -

In the AWS console, create a new 100GB GP2 EBS volume and attach it to your running instance.

Assuming the volume is attached as /dev/xvdf, lets extend rpool into this new volume:

$ sgdisk -Z -n1:0:0 -t1:BF01 -c1:ZFS /dev/xvdf
$ zpool add rpool /dev/xvdf1

This partitions the volume with a new GPT table, using everything for ZFS (again, I dont like giving ZFS the raw volume, as it will waste a bit of space when it partitions the volume for Solaris compatibility). Finally, we extend rpool onto the new volume.

Thats it! Now we see:

$ zpool list -v
NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool     109G   784M   109G         -     0%     0%  1.00x  ONLINE  -
  xvda1  9.94G   735M  9.22G         -     7%     7%
  xvdf1  99.5G  48.7M  99.5G         -     0%     0%

Weve added 100GB of storage completely transparently, and unlike creating a traditional EXT or XFS volume we dont have to mount it into a new directory - with ZFS the storage is just there, and available to all our ZFS filesystems.

Thanks For Reading

Hope that helps for anyone else looking to run ZFS exclusively in AWS. While not as easy as taking an off-the-shelf prebuilt AMI, you end up with an AMI that has only a minimal Debian or Ubuntu install - you know exactly want went into it, and the process for doing so.

If you run into any issues trying this, you can indirectly contact me by commenting on this blog entry, or try in ##aws on Freenode.

Continue reading on www.scotte.org