Initramfs - make your own
Make Yourself an Initramfs
Why
There are several tools to build an initramfs. Dracut and genkernel come to mind. They can generate a working initramfs too but users have no idea what is inside. They are usually, but don't need to be, tied to the kernel and gcc version too.
The only reason for following this guide is because you want to be in control.
What
The initramfs is a root filesystem in a file. It can contain whatever is required to boot the system and anything extra too.
Originally the root filesystem in a file was called the initrd. The difference is in the internal structure, which is not a concern at the level of this page. The two terms are frequently used interchangeably.
Overview
This page describes how to build an initramfs that is free of kernel modules. That means that its not tied to kernel versions, it's a once in the lifetime of the equipment thing, rather like firmware. The authors April 2009 initramfs still works.
The worked example covers root in LVM on top of mdadm raid. LUKS could be added or any of the bits illustrated that are not required can be removed.
It assumes that the kernel is built to include everything needed to mount root configured an built in. Kernel modules could be included in the initramfs but that is not described.
To enable the initramfs to be reassembled, should the embedded init script need to change, the binaries to build the initramfs will be built separately. This has the advantage that the initramfs can be built with its own USE flags.
Unless user space tools are needed to mount root, no initramfs is required at all
The kernel must include everything required to mount root as built in
Method
The kernel provides /usr/src/linux/usr/gen_init_cpio which assembles the initramfs from a list of files. It's input is a file containing a list of binary files to copy to the initramfs together with the destinations in the initramfs.
One of these files will be the init script, which will need to be written.
Initramfs Design
Don't skip this part. What does the initramfs need to do?
- It must include all the binaries to do whatever is needed to mount root.
- It must include an init script that controls what will be done
Build Location
The author likes /root/initramfs and that is used in this example.
Eventually it will contain bins/, init and initramfs_list. bins/ is the location of the binaries used to assemble the initramfs. init is the init script to get started and initramfs_list is the what goes where list to feed to gen_init_cpio
The example in this page is just that. It will need to be adjusted to suit the individual install
Example List of Requirements
- Assemble and start mdadm RAID.
- Activate Logical Volumes
- Mount root and possibly other filesystems from inside their Logical Volumes
- A filesystem checker for non root filesystem mounted in the initramfs
- Interactive Shell for debug
- Init script to control everything.
- Other things to suit the install at hand
Now to discover all the binaries required to meet the explicit requirements. They must be included in the initramfs_list.
Assemble and start mdadm RAID
This requires Multiple Device support built into the kernel and the mdadm user space tool.
That's /sbin/mdadm and all the libraries that it depends on
user $
lddtree /sbin/mdadm
|/sbin/mdadm (interpreter => /lib64/ld-linux-x86-64.so.2) | libc.so.6 => /lib64/libc.so.6
Different USE settings will produce different lists
Activate Logical Volumes
This requires Multiple Device and Logical Volume Manager support built into the kernel and the lvm userspace tool.
user $
lddtree /sbin/lvm
/sbin/lvm (interpreter => /lib64/ld-linux-x86-64.so.2) libdevmapper-event.so.1.02 => /lib64/libdevmapper-event.so.1.02 libdevmapper.so.1.02 => /lib64/libdevmapper.so.1.02 libm.so.6 => /lib64/libm.so.6 libreadline.so.8 => /lib64/libreadline.so.8 libtinfow.so.6 => /lib64/libtinfow.so.6 libblkid.so.1 => /lib64/libblkid.so.1 libaio.so.1 => /lib64/libaio.so.1 libc.so.6 => /lib64/libc.so.6
It has a bigger list of dependencies, also including /lib64/libc.so.6. Duplicates need only be provided once.
lvm2 takes USE=static, so a monolithic build can be used in the initramfs and a dynamic build used in the man install
Mount root and possibly other filesystems
Most users will want to use mount by filesystem UUID. Not all filesystem are on partitions, so PARTUUID cannot be used.
Mount by filesystem UUID requires the user space mount command
root #
lddtree /bin/mount
/bin/mount (interpreter => /lib64/ld-linux-x86-64.so.2) libmount.so.1 => /lib64/libmount.so.1 libblkid.so.1 => /lib64/libblkid.so.1 libc.so.6 => /lib64/libc.so.6
Interactive Shell for Debug
That will be busybox. Everyone uses busybox. Its small and has lots of utilities too.
user $
lddtree /bin/busybox
/bin/busybox (interpreter => None)
My example busybox is statically linked. That's a lifesaver when almost nothing works.
Design Implementation
Init script to control everything
This is the hard bit. Its all the commands that need to be entered at a root shell, using the initramfs to get started.
If the initramfs only contained busybox, what would need to be entered to boot?
Some error handing is a good idea too, so that debug is possible.
Elements of the Init Script
Its a shell script so it must start with the shebang line. Its not a comment.
#!/bin/busybox sh
The error handler is a function which will be called when something goes wrong. It takes one parameter, which is a text string to be printed when it is invoked.
Comments are good for maintenance later.
rescue_shell() { echo "$@" echo "Something went wrong. Dropping you to a shell." # The symlinks are not required any longer # but it helps tab completion /bin/busybox --install -s exec /bin/sh }
Parse the root filesystem out of the kernel command line and mount it.
# allow the use of UUIDs or filesystem lables uuidlabel_root() { for cmd in $(/bin/cat /proc/cmdline) ; do case $cmd in root=*) type=$(echo $cmd | /bin/cut -d= -f2) echo "Mounting rootfs" if [ $type == "LABEL" ] || [ $type == "UUID" ] ; then uuid=$(echo $cmd | /bin/cut -d= -f3) /bin/mount -o ro $(/bin/findfs "$type"="$uuid") /mnt/root else /bin/mount -o ro $(echo $cmd | /bin/cut -d= -f2) /mnt/root fi ;; esac done }
We only do that once, so it need not be a function but it makes the main flow of the script easier to read.
When things are mounted inside the initramfs, its good to be able check the filesystems first. The localmount service cannot check mounted filesystems.
# We need this for things that are mounted before localmount runs # like /usr and possibly /var check_filesystem() { # most of code coming from /etc/init.d/fsck local fsck_opts= check_extra= RC_UNAME=$(uname -s) # FIXME : get_bootparam forcefsck if [ -e /forcefsck ]; then fsck_opts="$fsck_opts -f" check_extra="(check forced)" fi echo "Checking local filesystem $check_extra : $1" if [ "$RC_UNAME" = Linux ]; then fsck_opts="$fsck_opts -C0 -T" fi trap : INT QUIT # using our own fsck, not the builtin one from busybox /sbin/fsck -p $fsck_opts $1 ret_val=$? case $ret_val in 0) return 0;; 1) echo "Filesystem repaired"; return 0;; 2|3) if [ "$RC_UNAME" = Linux ]; then echo "Filesystem repaired, but reboot needed" /sbin/reboot -f else rescue_shell "Filesystem still have errors; manual fsck required" fi;; 4) if [ "$RC_UNAME" = Linux ]; then rescue_shell "Fileystem errors left uncorrected, aborting" else echo "Filesystem repaired, but reboot needed" /sbin/reboot fi;; 8) echo "Operational error"; return 0;; 16) echo "Use or Syntax Error"; return 16;; 32) echo "fsck interrupted";; 127) echo "Shared Library Error"; sleep 20; return 0;; *) echo $ret_val; echo "Some random fsck error - continuing anyway"; sleep 20; return 0;; esac # rescue_shell can't find tty so its broken rescue_shell }
With those functions in support, we can do what's needed.
Notice the comments to make hard coding things easier
PATH="/sbin:/bin" # start for real here # temporarily mount proc,sys and dev mount -t proc proc /proc mount -t sysfs sysfs /sys mount -t devtmpfs devtmpfs /dev #mdam arrays to assemble #boot UUID : a25b05eb:3db18cbe:afb9312b:d1d97546 #host UUID : de8f2cbc:17ca3275:0b69db3c:b9f91a6b #kvm UUID : a3aab047:413ed52d:b15158fc:cdb637ef # boot /sbin/mdadm --assemble /dev/md0 --uuid=a25b05eb-3db18cbe-afb9312b-d1d97546 || echo "boot failed to assemble" /sbin/mdadm --assemble /dev/md1 --uuid=de8f2cbc-17ca3275-0b69db3c-b9f91a6b || rescue_shell "The host RAID set failed to assemble" /sbin/mdadm --assemble /dev/md2 --uuid=a3aab047:413ed52d:b15158fc:cdb637ef || echo "THE KVM space did not assemble"
Use /sbin/mdadm --assemble --run
to start the raid set with missing members, if possible.
root in LVM on RAID on USB requires a sleep to allow USB HDD to be available before mdadm --assemble runs
Its left as an exercise for the reader to parse the RAID UUID(s) out of the kernel command line.
If boot failed to assemble, it does not impact the boot process as both BIOS and UEFI are not raid aware. Indeed, the boot loader has to read boot to load initramfs to work out that /boot did not assemble. There is no need to call rescue_shell here.
Boot on RAID requires a RAID level and on disk RAID data layout leaves the filesystem untouched. RAID 1 and a raid metadata that lives at the end of the volume works.
Being lazy, start all the logical volumes and call the rescue shell if any one fails. In practice, only the one housing root is required. That's a local design decision.
# Then start LVM vgchange -ay || rescue_shell "Some/All Volume Groups failed to start"
Now mount other filesystems if needed. Typically /usr and /var
# space separated list of mountpoints that ... mountpoints="/usr" # /var" # ... we want to find in /etc/fstab ... /bin/ln -s /mnt/root/etc/fstab /etc/fstab # loop through the list of mountpoints for m in $mountpoints ; do #echo $m check_filesystem $m echo "Mounting $m" # mount the device and ... mount $m || rescue_shell "Error while mounting $m" # ... move the tree to its final location mount --move $m "/mnt/root"$m || rescue_shell "Error while moving $m" done
Set noauto in /etc/fstab for filesystems mounted here
# That's put all the pieces together, now tidy up echo "All done. Switching to real root." # clean up. The init process will remount proc sys and dev later umount /proc umount /sys umount /dev # switch to the real root and execute init exec /sbin/switch_root /mnt/root /sbin/init
That final exec call never returns so nervous readers could add rescue_shell "Fell off the end of init"
as the very last line.
Its a horrible script and has grown to its present state over 20 years or more
Building the Binaries
To avoid using binaries from the live filesystem, the initramfs binaries will be installed in /root/initramfs/bins. This allows trivial changes to the init script in years to come without, tracking down all the changed dependencies on the live filesystem. Both ways work. Its a design decision.
The down side of a separate build is that all the dependencies that will not go into the initramfs will be built too.
root #
emerge -av --root=/root/initramfs/bins <list_of_packages>
Set the USE flags to your liking and build your binaries. This authors preference is to build everything that supports static linking with USE=static.
<list_of_packages> depends on what is required of the initramfs. Like the rest of Gentoo its easy to add to if needed.
It looks a bit dated now but the bins package list on the example install is
user $
ls bins/var/db/pkg/*/*/*.ebuild
app-arch/bzip2-1.0.8-r1/bzip2-1.0.8-r1.ebuild sys-block/thin-provisioning-tools-0.9.0-r1/thin-provisioning-tools-0.9.0-r1.ebuild app-arch/gzip-1.11/gzip-1.11.ebuild sys-fs/e2fsprogs-1.46.4/e2fsprogs-1.46.4.ebuild dev-libs/expat-2.4.3/expat-2.4.3.ebuild sys-fs/lvm2-2.02.188-r2/lvm2-2.02.188-r2.ebuild dev-libs/libaio-0.3.112/libaio-0.3.112.ebuild sys-fs/mdadm-4.2-r1/mdadm-4.2-r1.ebuild dev-libs/libpcre-8.45/libpcre-8.45.ebuild sys-libs/e2fsprogs-libs-1.46.4-r1/e2fsprogs-libs-1.46.4-r1.ebuild dev-libs/libpcre2-10.39/libpcre2-10.39.ebuild sys-libs/glibc-2.33-r7/glibc-2.33-r7.ebuild dev-libs/libunistring-0.9.10-r1/libunistring-0.9.10-r1.ebuild sys-libs/libcap-2.62/libcap-2.62.ebuild net-dns/libidn2-2.3.2/libidn2-2.3.2.ebuild sys-libs/libxcrypt-4.4.25-r1/libxcrypt-4.4.25-r1.ebuild sys-apps/acl-2.3.1/acl-2.3.1.ebuild sys-libs/ncurses-6.2_p20210619/ncurses-6.2_p20210619.ebuild sys-apps/attr-2.5.1/attr-2.5.1.ebuild sys-libs/pam-1.5.1_p20210622-r1/pam-1.5.1_p20210622-r1.ebuild sys-apps/baselayout-2.7-r3/baselayout-2.7-r3.ebuild sys-libs/readline-8.1_p1-r1/readline-8.1_p1-r1.ebuild sys-apps/busybox-1.34.1/busybox-1.34.1.ebuild sys-libs/timezone-data-2021a-r1/timezone-data-2021a-r1.ebuild sys-apps/gentoo-functions-0.14/gentoo-functions-0.14.ebuild sys-libs/zlib-1.2.11-r4/zlib-1.2.11-r4.ebuild sys-apps/grep-3.7/grep-3.7.ebuild virtual/awk-1/awk-1.ebuild sys-apps/systemd-tmpfiles-249.9/systemd-tmpfiles-249.9.ebuild virtual/libcrypt-2/libcrypt-2.ebuild sys-apps/util-linux-2.37.2-r1/util-linux-2.37.2-r1.ebuild virtual/libiconv-0-r2/libiconv-0-r2.ebuild sys-auth/pambase-20210201.1/pambase-20210201.1.ebuild virtual/libintl-0-r2/libintl-0-r2.ebuild sys-auth/passwdqc-2.0.2-r1/passwdqc-2.0.2-r1.ebuild virtual/tmpfiles-0-r1/tmpfiles-0-r1.ebuild
Putting the Pieces Together
Thats what the initramfs_list file is for.
All the files discovered to be required during design, using lddtree, must be included
Describe the directory structure for the initramfs this example is from an arm64 system. amd64/x86 may differ.
# directory structure dir /proc 755 0 0 dir /usr 755 0 0 dir /bin 755 0 0 dir /sys 755 0 0 dir /var 755 0 0 #dir /lib 755 0 0 dir /lib64 755 0 0 dir /sbin 755 0 0 dir /mnt 755 0 0 dir /mnt/root 755 0 0 dir /etc 755 0 0 dir /root 700 0 0 dir /dev 755 0 0
Make a few critical device nodes
nod /dev/null 666 0 0 c 1 3 nod /dev/tty 666 0 0 c 5 0 nod /dev/console 600 0 0 c 5 1
They are probably not required with modern DEVTMPFS in the kernel.
All the main commands
# busybox # Output file name Input file name file /bin/busybox /root/initramfs/bins/bin/busybox 755 0 0 # Need real mount as busybox did not support UUID file /bin/mount /root/initramfs/bins/bin/mount 755 0 0 # for raid on lvm # Output file name Input file name file /sbin/mdadm /root/initramfs/bins/sbin/mdadm 755 0 0 file /sbin/lvm.static /root/initramfs/bins/sbin/lvm.static 755 0 0
Add some symbolic links to make life easier.
slink /sbin/vgchange /sbin/lvm.static 777 0 0 slink /sbin/vgscan /sbin/lvm.static 777 0 0 slink /bin/cat /bin/busybox 777 0 0 slink /bin/cut /bin/busybox 777 0 0 slink /bin/findfs /bin/busybox 777 0 0 slink /bin/ln /bin/busybox 777 0 0 slink /sbin/switch_root /bin/busybox 777 0 0 slink /lib64/libdl.so.2 /lib64/libdl-2.33.so 777 0 0 # libraries required by /sbin/fsck.ext4 and /sbin/fsck # The /lib -> /lib64 symlink is mostly harmless but its not right on arm64 slink /lib /lib64 777 0 0
The symlinks to /bin/busybox are probably not required as busybox assumes internal commands when a command is not found.
# libraries required by /sbin/fsck.ext4 and /sbin/fsck # The /lib -> /lib64 symlink is mostly harmless but its not right on arm64 slink /lib /lib64 777 0 0
All the required libraries too
# Output file name Input file name file /lib/ld-linux-aarch64.so.1 /root/initramfs/bins/lib/ld-linux-aarch64.so.1 755 0 0 file /lib64/libext2fs.so.2 /root/initramfs/bins/lib64/libext2fs.so.2 755 0 0 file /lib64/libcom_err.so.2 /root/initramfs/bins/lib64/libcom_err.so.2 755 0 0 file /lib64/libpthread.so.0 /root/initramfs/bins/lib64/libpthread.so.0 755 0 0 file /lib64/libblkid.so.1 /root/initramfs/bins/lib64/libblkid.so.1 755 0 0 file /lib64/libuuid.so.1 /root/initramfs/bins/lib64/libuuid.so.1 755 0 0 file /lib64/libe2p.so.2 /root/initramfs/bins/lib64/libe2p.so.2 755 0 0 file /lib64/libc.so.6 /root/initramfs/bins/lib64/libc.so.6 755 0 0 file /lib64/libmount.so.1 /root/initramfs/bins/lib64/libmount.so.1 755 0 0 file /lib64/libdl-2.33.so /root/initramfs/bins/lib64/libdl-2.33.so 755 0 0 file /sbin/fsck /sbin/fsck 755 0 0 file /sbin/fsck.ext4 /sbin/fsck.ext4 755 0 0 # our init script file /init /root/initramfs/init 755 0 0
This example works with ext4. Choose your
/sbin/fsck.fs_type
Mount /boot if its not mounted.
root #
/usr/src/linux/usr/gen_init_cpio /root/initramfs/initramfs_list > /boot/<initramfs_name>
Tell your boot loader about /boot/<initramfs_name> and reboot to test.
Ideas For Further Contributions
- Cover LUKS
- Cover root over NFS (Possible without an initlamfs too)
- Bring up the network in the initramfs with ssh access
- Other things
Networking in the initramfs is a security risk. The initramfs will need to be maintained