Btrfs/System Root Guide

Converting to a btrfs Based System
This exercise is one example for re-basing a Gentoo installation's root filesystem to use btrfs. In this case, the existing system is an mdadm based mirror set using two 2tb drives at and. Two fresh 2tb drives have been added at and.

Existing Layout

 * Simple two way mdadm mirror (raid1)
 * 250mb /boot partition as ext3 with metadata=0.90
 * 75gb / partition as ext4 with metadata=0.90
 * 750gb /home partition as ext4 with metadata=1.2
 * 1tb+ /vm partition as ext4 with metadata=1.2

The use of the older metadata format for /boot and / partitions allows grub-0.97 to find and boot the system without needing to resort to an initial ram device.

New Layout

 * 250mb /boot partition as ext3 with metadata=0.90
 * 1.9gb+ btfs partition with raid1 metadata and data

We are keeping the /boot as a simple software mirror in order to stay with grub-0.97 but will now need to use an initial ram filesystem. The kernel will otherwise panic when it attempts to mount btrfs filesystems that need to have a btrfs device scan done first. We will be following the writeup for Early Userspace Mounting to build that filesystem.

Btrfs has been built into the kernel (not a module) along with lzo compression/decompression as that will be used to optimize space utilization and read performance on the system volumes. Likewise, the raid modules used by mdadm are also builtins.

There are a number of places where lzo is part of a module name in the kernel .config.

It's unclear what kernel options are pickable to make sure that the lzo module btrfs relies on will be enabled. The likely suspect is CONFIG_HAVE_KERNEL_LZO which comes in to play for the compression of the kernel image itself:

The default selection of Gzip causes CONFIG_KERNEL_LZO to not be set as shown above. There doesn't appear to be a way to control the setting of CONFIG_HAVE_KERNEL_LZO short of editing .config directly.

Partitioning
cfdisk was used to partition /dev/sdc with the 250mb and remainder partitions by hand. sfdisk then can be used to apply the same scheme to /dev/sdd

Setting up boot
The warning about /boot partition alignment might have been avoided with some more care with cfdisk.

/boot Transfer
Grub must be installed on each of the new mirrors by hand as shown. That way if the first drive fails, the second drive can be moved down to /dev/sda, and the grub mbr code will have already been set up to make it look the first drive in the device chain.

Transfer of /
We assume that a "hot" transfer of the system will be okay and thus do a remount of / to /mnt/rawroot to grab the basics without pulling in any additional baggage from /proc, udev and other mounts. If the system is running any database servers such as mysql, postgres or an ldap backend, it is better to turn those services off first before attempting this. The new btfs will have the rootfs as a subvolume.

Other existing filesystem such as /home and /vm will become other subvolumes. We edit /etc/fstab on an interim basis to provide mountpoints for the new filesystem and its subvolumes. The compression and auto defragmentation features of btrfs may or may not be applicable for the underlying data. The lzo compressor has been turned off for /mnt/newdistfiles since it will be getting the contents of /usr/portage/distfiles where files are already compressed. The /mnt/newvm filesystem leaves out autodefrag as an option since it interferes with the performance of virtual machines and copy on write.

We kick off the root fs transfer and come back after a cup of whatever. Roughly speaking, the existing 2tb drive set is using almost 1.9tb, with about 40gb available on the / and /home filesystems and about 90gb available on the old /vm. The copy of the root filesystem is roughly 30 minutes. The other copies are left for overnight.

In this particular install, /usr/portage/distfiles had been a softlink to a directory on the old /home filesystem. If it was to have been split off from a physical directory instead, the transfer from the new root subvolume's usr/portage/distfiles to the newdistfiles subvolume is effectively a move between filesystems that would involve a copy and then a delete. It would be more efficient to make judicious use of the -exclude switch on tar when doing the initial copy to the new mirror set.

Edit Config on New Mirror
We edit the /etc/fstab on the new mirror set to reflect the way things should look when the new mirror set becomes the boot set. It's also probably cleaner to do a mountpoint for /distfiles as shown and then have /usr/portage/distfiles be a softlink to it.

We generate a new mdadm.conf file to include the new /dev/md5 /boot mirror but then edit it to rename that to /dev/md1. The other existing arrays are stubbed out, but the information is there in case the old mirror set is put back on again.

Creating the Initial Ram Filesystem
We will have to use an initial ram filesystem and an embedded init to mount the mirror set. Following the wiki entry for Early Userspace Mounting, we create the following files in /usr/src/linux/initramfs

There's a bit more than minimally necessary there to mount and pivot the root, but it allows us to use the rescue shell in busybox to fsck /boot as necessary and to do btrfs scrub and balance on a cold filesystem if we feel it is necessary. The following minimalist fstab is the key reason for this initial ram filesystem. It allows btrfs to locate the root volume by label name and to enable compression and autodefrag on the initial mount.

Note - You will probably need to create the /usr/src/linux/initramfs directory before introducing the above files.

The init script here was essentially stolen from the early usermount page and has been hacked up a bit. It could probably stand some more cleaning and maybe additional smarts. One might notice that the root and rootflags parameters in the grub.conf appearing below are superfluous because they will be ignored by the init script. However it is useful to show what mount options are being used and how the mount happens at a glance. The actual initramfs creation is done as follows.

It would be a good idea to balance the new btrfs filesystems before booting into the new mirror set. It isn't crucial to do it now, but it will speed up performance of the initial boot. The balance can just as easily be done on the live volumes after the new mirror set is booted. Balancing everything on the new 2tb set will probably take a good bit of an overnight depending on the amount of space used.

Booting the new system
We take out the old mirror set after poweroff and put them into safe storage. The new mirror set is moved to become /dev/sda and /dev/sdb.

Once grub has booted into the kernel, the initial set of system messages display for detecting disk drives, etc. The screen will clear and appear to hang for about 5 to 10 seconds as btrfs scans for system devices. If the system has a cdrom drive, ignore any warnings about media not being present in /dev/sr0. Then it will continue to boot after switching in to the real root as noted in the early userspace mounting entry.

If the new btrfs filesystems has not been balanced, performance may be a bit sluggish initially, but things will speed up. For this example exercise, the most significant thing to notice is the amount of space now available for use. It is also all part of the same pool shared by the subvolumed filesystems:

Most of the space savings probably came from compression of the kvm guest images that were in /vm, but /home and the system root also contribute significant savings.