Btrfs/System Root Guide

Converting to a btrfs Based System
This exercise is one example for re-basing a Gentoo installation's root filesystem to use btrfs. In this case, the existing system is an mdadm based mirror set using two 2TB drives located at and. Two fresh 2TB drives have been added at and.

Existing layout
Simple two way mdadm mirror (RAID1):


 * 250MB partition as ext3 with metadata=0.90
 * 75GB partition as ext4 with metadata=0.90
 * 750GB partition as ext4 with metadata=1.2
 * 1TB+ partition as ext4 with metadata=1.2

The use of the older metadata format for and  partitions allows grub-0.97 to find and boot the system without needing to resort to an initial ram device.

New layout

 * 250MB partition as ext3 with metadata=0.90
 * 1.9TB+ btrfs partition with RAID1 metadata and data

We are keeping the as a simple software mirror in order to stay with grub-0.97 but will now need to use an initial ram filesystem. The kernel will otherwise panic when it attempts to mount btrfs filesystems that need to have a btrfs device scan done first. We will be following the write up for Early Userspace Mounting to build that filesystem. See also Custom Initramfs for more details on various ways to prepare the filesystem.

Btrfs has been built into the kernel (not a module) along with lzo compression/decompression as that will be used to optimize space utilization and read performance on the system volumes. Likewise, the raid modules used by mdadm are also built-in.

There are a number of places where lzo is part of a module name in the kernel.

It is unclear what kernel options are pickable to make sure that the lzo module btrfs relies on will be enabled. The likely suspect is  which comes in to play for the compression of the kernel image itself:

The default selection of gzip causes CONFIG_KERNEL_LZO to not be set as shown above. There doesn't appear to be a way to control the setting of CONFIG_HAVE_KERNEL_LZO short of editing directly.

Partitioning
cfdisk was used to partition with the 250MB and remainder partitions by hand. sfdisk then can be used to apply the same scheme to.

Setting up boot
The warning about partition alignment might have been avoided with some more care with cfdisk.

transfer
Grub must be installed on each of the new mirrors by hand as shown. That way if the first drive fails, the second drive can be moved down to, and the GRUB MBR code will have already been set up to make it look the first drive in the device chain.

transfer
We assume that a "hot" transfer of the system will be okay and thus do a remount of to  to grab the basics without pulling in any additional baggage from, udev and other mounts. If the system is running any database servers such as mysql, postgres or an ldap back-end, it is better to turn those services off first before attempting this. The new btrfs will have the rootfs as a subvolume.

Other existing filesystem such as and  will become other subvolumes. We edit on an interim basis to provide mountpoints for the new filesystem and its subvolumes. The compression and auto defragmentation features of btrfs may or may not be applicable for the underlying data. The lzo compressor has been turned off for since it will be getting the contents of  where files are already compressed. The filesystem leaves out autodefrag as an option since it interferes with the performance of virtual machines and copy on write.

We kick off the root fs transfer and come back after a cup of whatever. Roughly speaking, the existing 2TB drive set is using almost 1.9TB, with about 40GB available on the and  filesystems and about 90GB available on the old. The copy of the root filesystem is roughly 30 minutes. The other copies are left for overnight.

In this particular install, had been a softlink to a directory on the old  filesystem. If it was to have been split off from a physical directory instead, the transfer from the new root subvolume's to the  subvolume is effectively a move between filesystems that would involve a copy and then a delete. It would be more efficient to make judicious use of the -exclude switch on tar when doing the initial copy to the new mirror set.

Edit configuration on the new mirror
We edit the on the new mirror set to reflect the way things should look when the new mirror set becomes the boot set. It is also probably cleaner to do a mountpoint for as shown and then have  be a softlink to it.

We generate a new file to include the new   mirror but then edit it to rename that to. The other existing arrays are stubbed out, but the information is there in case the old mirror set is put back on again.

Creating the Initial Ram Filesystem
We will have to use an initial ram filesystem and an embedded init to mount the mirror set. Following the wiki entry for Early Userspace Mounting, we create the following files in.

There's a bit more than minimally necessary there to mount and pivot the root, but it allows us to use the rescue shell in busybox to fsck as necessary and to do btrfs scrub and balance on a cold filesystem if we feel it is necessary. The following minimalist is the key reason for this initial ram filesystem. It allows btrfs to locate the root volume by label name and to enable compression and autodefrag on the initial mount.

The init script here was essentially stolen from the early usermount page and has been hacked up a bit. It could probably stand some more cleaning and maybe additional smarts. One might notice that the root and rootflags parameters in the appearing below are superfluous because they will be ignored by the init script. However it is useful to show what mount options are being used and how the mount happens at a glance. The actual initramfs creation is done as follows.

It would be a good idea to balance the new btrfs filesystems before booting into the new mirror set. It is not crucial to do it now, but it will speed up performance of the initial boot. The balance can just as easily be done on the live volumes after the new mirror set is booted. Balancing everything on the new 2tb set will probably take a good bit of an overnight depending on the amount of space used.

Booting the new system
We take out the old mirror set after poweroff and put them into safe storage. The new mirror set is moved to become and.

Once GRUB has booted into the kernel, the initial set of system messages display for detecting disk drives, etc. The screen will clear and appear to hang for about 5 to 10 seconds as btrfs scans for system devices. If the system has a cdrom drive, ignore any warnings about media not being present in. Then it will continue to boot after switching in to the real root as noted in the early userspace mounting entry.

If the new btrfs filesystems has not been balanced, performance may be a bit sluggish initially, but things will speed up. For this example exercise, the most significant thing to notice is the amount of space now available for use. It is also all part of the same pool shared by the subvolumed filesystems:

Most of the space savings probably came from compression of the kvm guest images that were in, but and the system root also contribute significant savings.

Mounting rootfs fails when using btrfs RAID
In some cases the booting of a btrfs root volume that has a mirroring(btrfs raid-1) and lzo compression activated, the initramfs mount of the root volume fails with:

cannot mount /dev/sda3: "Invalid argument".

if you drop to the shell and look to the dmesg output it shows open_ctree failed

I was able to correct the boot with adding appropriate rootflags=device=/dev/sda3,device=/dev/sdb3 and rootfstype=btrfs to the kernel cmdline: