User:Ali3nx/Installing Gentoo Linux EFISTUB On ZFS

Install Gentoo Linux on OpenZFS using EFIStub Boot
Author: Michael Crawford (ali3nx) Contact: mcrawford@eliteitminds.com

Preface
This guide will show you how to install Gentoo Linux on AMD64 with:

* UEFI-GPT (EFI System Partition) - This will be on a FAT32 unencrypted partition as per UEFI Spec. * /, /home/username, on segregated ZFS datasets * /home, /usr, /var, /var/lib zfs dataset containers created for pool dataset structure * raid 1 or mirrored disk configuration * swap on regular partition * OpenZFS 2.0.6+ * efistub boot without Grub * dracut initramfs (optionally genkernel) * systemd or openrc * Gentoo Stable (amd64)

Why efistub boot!? grub works for everyone!


 * UEFI bios motherboards have been the default on all modern computer hardware since around 2013 entirely depreciating legacy bios.
 * The modernization and wide availability of UEFI motherboards has retired the mandatory requirement for software bootloaders such as grub.
 * grub itself when UEFI booted uses efistub to boot both itself and linux OS installs. This additional interference is unnecessary to boot Linux.
 * Intel has publicly stated that legacy bios CSM compatibility switch support will be entirely depreciated on new hardware manufactured after 2020 forcing use of true uefi boot modes

Why not use grub with zfs!?


 * The wiki guides for zfsroot from zfsonlinux and many distros all advise using grub bootloader which can work however grub doesn't fully support the newest zfs pool feature flags and using grub can be an added risk as well as added complication that can be entirely mitigated by using a uefi boot efistub configuration to boot your zfs root pool directly.
 * The risk of using grub with zfs arises from the lack of modern pool feature support for zfsonlinux which requires the administrator tread carefully to ensure that a global zpool upgrade is never run or your zfsroot configuration becomes unbootable due to the legacy zfs pool feature flags required for grub to function having been upgraded. Such an occurrence having happened cannot be undone and recovery would require some major surgery from a livecd.
 * Building a new system install initially using a legacy configuration implies additional ongoing maintenance be accepted to maintain a legacy configuration.
 * zfs rootfs dataset encryption is easier to configure utilizing efistub boot.

Download the System Rescue CD + ZFS ISO
You will need to download System Rescue CD that includes ZFS from this github project.

LiveUSB Creation
We will be creating a UEFI Bootable USB since this guide will be showing you how to install Gentoo Linux on ZFS with UEFI Enabled.

For the following commands, we will assume that your USB is /dev/sdg.

Create the FAT32 Filesystem on the USB
We will now create the FAT32 filesystem on the USB. This needs to be FAT32 since this is the filesystem used in the UEFI Specification. The label we will use for this partition will be in the following format SYSRCDXYZ, where XYZ is the version number of the System Rescue CD you downloaded.

For example, if you are using System Rescue CD 6.1.3, the label will be SYSRCD613.

Copy files over from ISO to USB
And that's it! You now have a Bootable UEFI USB.

Windows
Etcher is the USB Utility I recommend when on Windows for sysrescuecd+zfs iso. You can Download Etcher here.


 * 1) Start Etcher
 * 2) Select your USB Device from the Device drop down.
 * 3) Select your ISO by clicking SELECT.
 * 4) Click START.

This should be all that's necessary to have a Bootable UEFI USB.

Assumptions

 * Only installing Gentoo on one drive called /dev/sda (or /dev/nvme0n1, etc)
 * nchevsky System Rescue CD + ZFS iso is being used.
 * dracut is being used as your initramfs.
 * gentoo-kernel-bin is being used as your kernel.

Boot your system into the zfs LiveUSB
Since this is highly computer dependent, you will need to figure out how to boot your USB on your system and get to the live environment. You may need to disable Secure Boot if that causes your USB to be rejected. Make sure your system BIOS/UEFI is set up to boot UEFI devices, rather than BIOS devices (Legacy).

Confirm that you booted in UEFI Mode
After you booted into the Live CD, make sure that you booted into UEFI mode by typing the following:

If the above directory is empty or doesn't exist, you are not in UEFI mode. Reboot and boot into UEFI mode.

Partition
We will now partition the drive and aim to create the following layout:

/dev/sda1  | 512 MB        |   EFI System Partition                | /efi /dev/sda2  | 32768 MB      |   swap                                | swap /dev/sda3  | Rest of Disk  |   ZFS                                 | /, /home/username ...

/dev/sdb1  | 512 MB        |   EFI System Partition                | /efi2 /dev/sdb2  | 32768 MB      |   swap                                | swap /dev/sdb3  | Rest of Disk  |   ZFS                                 | /, /home/username ...

Open up your drive in GNU parted and tell it to use optimal geometry alignment:

Create GPT partition layout
This will delete all partitions and create a new GPT table.

Larger swap will accommodate hibernation should that be desired and using swap with zfs is highly advised. 32GB swap is used in the below example to accommodate many different hardware configurations.

Create and label your partitions
parted does not offer a zfs filesystem type so btrfs is used temporarily. the filesystem label name is largely autodetected and as a result will become irrelevant after zpool creation.

Final View
If using mirror disk configuration

Exit the application

Determine disk/by-id identifier
Using traditional block device identifiers such as /dev/sda or /dev/nvme0n1 with zfs can work but can also be undesirable due to the possibility of a block device name changing. Something as simple as connecting a usb storage device can cause this to occur.

Should this ever happen zfs pools are unaware of the change having occurred which can render a zfs pool inoperable. Use of non generic device specific disk identifiers which are also identified by disk serial number is more desirable for use with zfs as a result of this complication. This also provides added utility advantages for identifying a faulty disk in larger zfs pools.

To determine the non generic ata disk identifier id type the following

Nvme storage devices would resemble this example

Generally using /dev/disk/by-id/ata-disk or /dev/disk/by-id/nvme-disk is more desirable to ensure the disk block device is more specific. There may be /dev/disk/by-id/wmm or /dev/disk/by-id/nvme-eui. Use of these block device identifiers in the example below should be avoided if possible for use with this guide.

Create your zpool
Create your zpool which will contain your drives and datasets:

xattrs and posixacl are enabled to provide support for modern filesystem security features. Relative atime updates which are a global default in ext4 are enabled as well.

xattrs is necessary for proper functionality of systemd-journald

It is beneficial and important to create or generate a valid the zfs /etc/hostid file in advance of creating the first zfs pool to ensure that a valid zfs hostid is referenced later by the initramfs during initial system boot. Occasionally if the zfs rpool hostid and initramfs hostid reference mismatch pool import can fail until a new hostid and zpool.cache file can be regenerated from initramfs rescue shell.

The command to ensure the removal existing zfs hostid file and generate a new zfs hostid record is

+ To create the zfs root pool including a mirror configuration
Substitute ata-disk1-part3 for nvme-disk1-part3 and ata-disk2-part3 for nvme-disk2-part3 if you have an nvme ssd disk.

To create the zfs root pool including a single disk
Substitute ata-disk1-part3 for nvme-disk1-part3 if you have an nvme ssd disk.

Create your rootfs zfs datasets
Create the dataset container structure and dataset necessary for /.

Create /usr, /var, /var/lib and /home zfs dataset containers
Creation of several unmounted dataset containers is necessary to provide dataset structure for the zfs pool. Creation of these containers after install is complete can be disruptive, involved and best completed before filesystem contents are written to disk to ensure the system will boot. Dataset containers for /usr and /var especially benefit from this having been completed in advance. This structures datasets within the pool for correct dataset segregation. The /var/lib dataset container is created to allow for easy creation of /var/lib/foo datasets for system or network services if desired at a later date.

rpool/home dataset container is created to segregate user home directory dataset contents from the rootfs dataset for improved rootfs dataset incremental snapshot size management to ensure that rootfs snapshots do not fill the available pool storage space.

Additional accomodation must be made when using systemd with zfs to ensure that zfs /home dataset container is not configured to use a mountpoint as systemd may attempt to create a new /home directory on system boot causing the user home directory datasets to fail to mount on system boot due to a pool import mountpoint conflict.

Creating the rpool/home dataset container using the canmount=off option omitting a directory mountpoint ensures this complication will be unlikely to occur.

Create user home directory dataset
Replace username with the desired user name

Verify everything looks good
You can verify that all of these things worked by running the following:

Now we are ready to install Gentoo!

Set your date and time
We use ntpdate to set accurate time,date and hardware clock to mitigate clock skew that can cause software compilation to malfunction

Preparing to chroot
First let's mount our efi boot partition in our chroot directory:

We'll use the Oregon State University Gentoo Linux mirror. If you desire use a different regional mirror from the official Gentoo Linux mirror list

Download the systemd amd64 stage3 system archive and extract it

Edit fstab
Use of disk UUID's to denote block devices entries in fstab has become the more desirable default to ensure an unpredicted block device alteration never renders a filesystem unmountable as a result of fstab becoming inaccurate. Something as simple as connecting a usb storage device to a booted system has been known to cause this to occur. The blkid command reveals these disk identifiers that are available for disk partitions created on gpt disk partition labels. Despite having created disk partition names disk UUID's are more specific.

Everything is on zfs so we don't need anything in here except for the boot and swap entries. fstab should resemble the following example. Substitute the provided UUID's from your blkid command:

Modify make.conf
Let's modify our /etc/portage/make.conf so we can start installing stuff with a good base (Change it to what you need):

Get the portage tree
Copy the default example portage config

Install required applications
Now install the initial apps:

Reviewing the current gentoo-sources Linux kernel version
Gentoo provides eselect to manage many core system environment variables including the active /usr/src/linux symlink.

The command result of eselect should match the active linux kernel symlink

Necessary kernel configuration features for custom kernel builders
efistub boot relies on a key Linux kernel configuration feature to function

requires Zlib kernel support (module or builtin).

relies on the following menu options provided by

Invoke the Linux kernel configuration menu
The Linux kernel provides a console based configuration menu. Select the required configuration features in addition to necessary configuration features for your hardware.

Install zfs software and kernel module
and must be installed after kernel configuration is complete

Install ZFS software

Generate and verify the zfs hostid file
This is necessary for genkernel initramfs generation and zfs pool import integrity verification

Installing the gentoo-sources kernel binary
Install the kernel

Generate and copy initramfs file to its correct location
Genkernel initramfs works good for most configurations and provides an alternative initramfs creation and management option where dracut may be experiencing difficulties importing zfs pools at system boot. I've experienced this with some configurations using fast ssd storage pools. When this abundance of performance was available dracut loaded the initramfs too fast causing a latency delay loading the zfs kernel module consequentially causing pool import failure on system boot. Reproducing this behavior can be hit or miss if kernel module modprobe latency ever does occur during initramfs processing.

To attempt to introduce additional processing latency into a genkernel initramfs to slow down initramfs processing a solution was devised to include the entire linux-firmware contents into a genkernel initramfs. This worked very well for many months however this purposefully bloated initramfs when uncompressed is very large and may not function with some common home pc motherboards. My server as can be seen below is an older model supermicro enterprise server motherboard and has no disagreements with being force fed a 600MB uncompressed initramfs image at system boot.

If your able to use dracut and dracut works do use dracut. If you prefer to use genkernel you can at your desire to not include the --firmware option to create a sensibly sized genkernel initramfs.

Recently my server has been using a dracut initramfs with dual vdev ssd mirror pool and experienced no pool import failure concerns but when those do occur using a different initramfs has resolved those complications.

Installing the bootloader onto your drive
We will need to configure the bootloader entry in uefi firmware to direct boot the linux kernel and initramfs.

The following command will install the uefi bootloader entry in uefi firmware referencing the kernel and initramfs located at /efi/efi/gentoo

Edit the Linux kernel version to the desired current version used.

efibootmgr will print the uefi firmware loader table contents upon success also revealing the updated boot order

Take a snapshot of your new system
Since we now have a working system, we will snapshot it in case we ever want to go back or recover files:

You can view the status of these snapshots using the zfs command

ZFS dataset snapshot automation
There are two common options available for zfs snapshot automation.

sys-fs/zfs-auto-snapshot is available from gentoo's main repo sys-fs/sanoid a superior and more feature rich zfs snapshot manager that also provides syncoid

Sanoid is available from a gentoo overlay I maintain named sensible-overlay. Directions to configure the overlay are provided on the github page.

Configuring Sanoid
A simplified configuration for sanoid is provided below to configure /etc/sanoid/sanoid.conf to automate snapshots of rpool/ROOT/gentoo and rpool/home/username


 * 1) sanoid.conf file #
 * 1) sanoid.conf file #

[rpool/ROOT/gentoo] use_template = production

[rpool/home/username] use_template = production


 * 1) templates below this line #
 * 1) templates below this line #

[template_production] # store hourly snapshots 36h hourly = 36

# store 30 days of daily snaps daily = 30

# store back 6 months of monthly monthly = 6

# store back 3 yearly (remove manually if to large) yearly = 3

# create new snapshots autosnap = yes

# clean old snapshot autoprune = yes

Configuring zfs-auto-snapshot (optional)
Configure daily and weekly snapshot generation for rpool/ROOT/gentoo

Installing required cron daemon
Enable the system service and start cronie cron daemon as required for functionality of sys-fs/sanoid or zfs-auto-snapshot.

Limiting the ARC size
If you want to cap the ZFS ARC from growing past a certain point, you can put the number of bytes inside the /etc/modprobe.d/zfs.conf file, and then remake your initramfs. When the system starts up, and the module is loaded, these options will be passed to the zfs kernel module.

ARC cache memory usage will vary depending on zfs pool sizes. I've had a 50TB single vdev raidz2 pool consume 24GB of memory at system idle when unlimited however zfs wll generally default to using 50% of available system memory for the ARC cache

(Temporary) Change the ARC max for the running system to 4 GB

(Permanent) Save the 4 GB ARC cap as a loadable kernel parameter

Once we have the above file created, let's regenerate the initramfs. genkernel will automatically detect that this file exists and copy it into the initramfs. When you reboot your machine, the initramfs will load up the zfs kernel module with the parameters found in the file.

Limiting maximum trim I/Os active to each device. ( Optional )
Some hard disk controllers or ssd disks may exhibit disk controller resets when zpool trim  is run due to either the disk controller or disk not being able to process multiple synchronous disk controller driver commands being issued to a disk.

A known workaround is to reduce the default value of zfs_vdev_trim_max_active from the default value of 2 to 1 using a zfs driver parameter in the /etc/modprobe.d/zfs.conf file, and then remake your initramfs. When the system starts up, and the module is loaded, these options will be passed to the zfs kernel module.

I've had this behavior or symptom occur using an LSI 9305-16i HBA controller which relies on the mpt3sas kernel driver with Samsung 860 evo ssd's.

There is an open bug on openzfs git discussing this issue.

If this symptom did occur and a sysadmin had zpool trim configured to run from a crontab schedule a zfs pool scrub may be required, pool desync or data corruption at the very worst may occur. zfs has always detected the controller reset behavior as the pool or disk within the pool having been affected by an unrecoverable error prompting zpool replace to be used or zpool clear to clear the error state.

(Temporary) Change maximum trim I/Os active to each device.

(Permanent) Save the maximum trim I/Os active to each device as a loadable kernel parameter

Once we have the above file created, let's regenerate the initramfs. genkernel will automatically detect that this file exists and copy it into the initramfs. When you reboot your machine, the initramfs will load up the zfs kernel module with the parameters found in the file.

Successful Installations

 * My custom gentoo zfs HTPC nas server
 * Gentoo HTPC zfs NAS Neofetch
 * TdDF Gentoo zfs nas server - Austin Texas USA. Installed remotely 12/2019. i9-9900k, 32GB DDR4, 7x10TB WD Red's raidz2, Adata SSD root mirror pool.

Credit and Thanks

 * Fearedbliss, Richard Yao and Georgy Yakovlev - zfs and Gentoo wouldn't be what has become without their generous dedication and contributions.
 * Fallendusk for generously contributing to the gentoo reddit community.
 * Everyone that helped me learn in 17 years using gentoo. I promise to pay it forward.
 * Kerframil for the Low latency coffee! Go Kerf :)