User:Capezotte/s6 on Gentoo

From Gentoo Wiki
Jump to:navigation Jump to:search
Warning
Incomplete. Even if it was complete to my standards, remember s6 is not supported by Gentoo, and that I don't guarantee it will work.

This aims to be a guide on how to set up a Gentoo system with a full s6 stack (s6-linux-init, s6-svscan+s6-supervise, s6-rc), ultimately allowing you to replace sysvinit and OpenRC.

Included in the LEGO set

  • s6-svscan and s6-supervise (and associated tools) are the workhorse of system management. They start services, ensure they are reachable and there's only one instance of them, and restarts them automatically if they crash. It's a supervision suite, like runit (without runit-init).
  • s6-linux-init performs the bare minimum of system initialization needed for s6-svscan, executes it as PID 1, and handles shutdown. It's a init process, like sysvinit or runit-init.
  • s6-rc enhances s6-svscan with dependency management and the ability to run oneshots (scripts do one thing on system startup/shutdown). It's a service manager, like OpenRC.

Getting started

You'll want to emerge sys-apps/s6 (s6-svscan+s6-supervise), sys-apps/s6-rc and sys-apps/s6-linux-init. Avoid emerging the latter with sysv-utils so you can fall back to sysvinit+OpenRC if things go awry.

Introduction to service definitions

Creating services for s6-rc will feel familiar if you've already used runit, but it also has support for one-shot scripts, which allow us to perform the core system initialization and starting actual long-running services (what runit calls, respectively, stage 1 and 2) with the same tool.

With s6-rc, services definitions are folders with at least one file called type, whose contents are one of these three strings:

  • longrun: what we usually think of being services: programs that run for the lifetime of the machine, providing some sort of functionality (e.g. device management, usually udev). If you choose this type, there must also be an executable file called run. It's usually a script that performs necessary setup and then execs into the program, which must be in foreground mode (otherwise, s6 will lose track of it). Again, runit users will be familiar with this requirement.
  • oneshot: a small script performing setup before or after a service is started. For instance, mounting filesystems (i. e. calling mount -a), or calling udev to recognize currently plugged devices (i. e. udevadm trigger). If you choose this type, you must provide an up file, which is an execline script. For very simple scripts, it will be exactly like shell, but you can't use single quotes. If you aren't willing to fully learn it, no worries -- just write the script in whichever language you like the most, mark it as executable and write its filename to up.
  • bundle: a set of longruns, oneshots or even other bundles. You write the name of the each item, one per line, to a file named contents, which is mandatory.

Both longrun and oneshot can have an additional dependencies file, which lists, one per line, what other longruns, oneshots or bundles must be working before this service is started.

This is just a very simplistic introduction. For the full description, read the docs.

Note
After s6-rc 0.5.3.0, you can, instead of writing contents or dependencies files, create contents.d/ or dependencies.d/ folders with empty files named after the services. For instance:
root #cat dependencies # classic
service1
service2
root #ls dependencies.d # new
service1 service2

Creating a service database

Now you need to pick a folder in your system to be the place where the service definitions are located (the service database). You can pick any folder for this, I personally prefer /etc/s6-rc/src, but you can put /etc/s6-rc/sv, /var/runit/service, or /var/cow-petting. Create subfolders and start working on the service definitions inside of them.

The bare minimum of core initialization required for most Linux systems is a series of oneshots:

  • Mounting the essential virtual filesystems (/dev, /proc, /sys)
  • Parsing /etc/fstab
  • Creating the {u,b,w}tmp files (especially if you use sys-libs/glibc).
  • Reading a few settings from the file system (sysctls, hostname...)
  • Cold-plugging the device manager.

For reference, you can see how Artix implements core s6 services. The ones that do the above steps are:

  • mount-devfs
  • mount-filesystems
  • mount-net
  • mount-procfs
  • mount-sysfs
  • mount-tmpfs
  • sysctl
  • cleanup
  • hostname
  • udevadm
  • udevd-{srv,log}

In addition, to actually get a login prompt, the agetty-* longruns are needed.

For a quick start, you can copy-paste them. Notice that udevd-srv needs to be adjusted as Gentoo and Artix place the udev binary in different folders.

Let's look into a few examples.

Warning
They are extremely simplified, and meant to serve as an introduction to s6-rc and execline. Artix Linux s6-scripts and s6-services repositories are real-world examples, containing dependencies, hardening settings, possibility for user options coming from outside the script, and more.

Example 1. mount-procfs

This will be familiar to you if you had to setup a Linux container/chroot, or borked your system so hard you had to resort to init=/bin/sh. On a shell, you'd run:

root #mount -t proc proc /proc

This is run only once and you're set for the machine's lifetime. This calls for a oneshot type service, with up's contents set to the above command. That's it, this service is good enough to mount /proc on your machine at boot.

However, if this script is called again for some reason, /proc will be mounted again (and on Linux, mounts can stack). This will leave us with two entries. To fix this, we should write this service so restarting it won't change anything unless it's necessary (idempotence).

The command for checking if a folder is already mount is mountpoint -q /folder. So we should only mount /proc if this command fails.

In execline, this translates to if -nt { command1 } command2.

More information on execline's if here.

Final layout:

FILE type
oneshot
FILE up
if -nt { mountpoint -q /proc } mount -t proc proc /proc

Example 2. mount-sysfs

If you were able to understand the previous example, you'd easily arrive at the following command line:

if -nt { mountpoint -q /sys } mount -t sysfs sys /sys

However, within /sys there are other mountpoints that must be handled:

  • securityfs (/sys/kernel/security)
  • efivarfs (/sys/firmware/efi/efivars).

If it's possible to mount them, the folders will be present. Therefore, the script must check if they're there (not considering it a failure for them to be absent), and, if so, mount them.

if -t { test -d /sys/kernel/security } if -nt { mountpoint -q /sys/kernel/security } mount -t securityfs securityfs /sys/kernel/security

This command line just got fairly long, so let's take advantage of the fact that, in execline, newlines are equivalent to spaces:

if -t { test -d /sys/kernel/security }
if -nt { mountpoint -q /sys/kernel/security }
mount -t securityfs securityfs /sys/kernel/security

But wait. If newlines are equivalent to spaces, how do I chain multiple command lines together? No, not with a semicolon, but with a foreground { ... } block.

foreground {
  if -nt { mountpoint -q /sys } mount -t sysfs sys /sys
}
if -t { test -d /sys/kernel/security }
if -nt { mountpoint -q /sys/kernel/security }
mount -t securityfs securityfs /sys/kernel/security

Adding efivarfs to the mix, by putting securityfs's command line inside another foreground block, yields the final result.

FILE type
oneshot
FILE up
foreground {
  if -nt { mountpoint -q /sys } mount -t sysfs sys /sys
}
foreground {
  if -t { test -d /sys/kernel/security }
  if -nt { mountpoint -q /sys/kernel/security }
  mount -t securityfs securityfs /sys/kernel/security
}
  if -t { test -d /sys/kernel/security }
  if -nt { mountpoint -q /sys/kernel/security }
  mount -t securityfs securityfs /sys/kernel/security
Note
Under the hood, execlineb just translates the curly braces and executes the command line, terminating itself after doing so. That's why a separate foreground program is needed to chain multiple commands; while bash and the other shells are always there to read the next line of a shell script, the equivalent is not true for execline scripts.

Yes. I said program, not command or keyword. dev-lang/execline installs executables called /bin/if, /bin/foreground and even /bin/cd!

The secret to execline is that these programs, after doing their job, execute into the rest of their arguments (chainloading).

Example 3. sshd

The SSH daemon (sshd) is a program that will stay up for the lifetime of the machine, so it's a clear longrun.

For the run file, all you need to do is write a script that execs sshd in the foreground (which, by default, it doesn't, so the -D option needs to be given).

#!/bin/execlineb -P
sshd -D

Unlike a shell script for runit, writing exec is not necessary; that's implied in execline (see the note above).

As a convenience for the user, we could run ssh-keygen -A before sshd itself, which generates SSH host keys if necessary.

FILE type
longrun
FILE run
#!/bin/execlineb -P
foreground { ssh-keygen -A }
sshd -D
Tip
Like ssh-keygen -A, mkdir -p is idempotent; you do not need to write if -nt { test -d folder } mkdir -p folder.
More examples can be found on this article about idempotent shell scripts.

Turning the service database into something useful

When you're done, you need to compile the database. s6-rc will build a dependency list and make your s6-rc services ready to be plugged into s6-svscan, and complain if the dependencies/types are wrong. For convenience later (we'll explain in the s6-linux-init-part), we'll make it compile to a folder under /etc/s6-rc with a unique name, then symlink to /etc/s6-rc/compiled:

root #NAME=$(date +%s)
root #s6-rc-compile /etc/s6-rc/$NAME /etc/s6-rc/src
root #ln -sfT $NAME /etc/s6-rc/compiled
Note
s6-rc doesn't make any sort of verification in the run/up files themselves, so double check them - you can even test them by running:
root #execlineb -P up

- for oneshots, or

root #$INTERPRETER run
- for longruns, where $INTEPRETER can be execlineb, sh, python... depending on the file's shebang.

Setting up s6-linux-init

s6-linux-init will be the first process of the system and a piece bundled with it (s6-linux-init-shutdownd) will remain operational for the lifetime of the machine, waiting for the fateful shutdown command. It's configured through files in /etc/s6-linux-init/current.

First, let's use included s6-linux-init-maker program to create the /etc/s6-linux-init/current directory (though it's still non-functional):

root #s6-linux-init-maker -G "/sbin/agetty tty12 38400 linux" -1 "$PWD/tmp"
root #mv -fT -- "$PWD/tmp" /etc/s6-linux-init/current
  • With -G , we can specify an emergency service that will always be started. In this example, it's a getty on tty12 (Ctrl+Alt+F12) which you can use to login to your system even if your s6-rc config is borked.
  • -1 sends system initialization logs to the tty as well as to the /run/uncaught-logs/current file. If this is not specified, only /run/uncaught-logs/current will have the logs.

More details here.

Of the folders in /etc/s6-linux-init/current, scripts is the most important; they're called after s6-svscan is set up (rc.init) and before (rc.shutdown) and after (rc.shutdown.final) every process other than init is killed. Those are the moments where every piece of the LEGO set comes together.

  • rc.init: uncomment s6-rc-init /run/service line. This copies the s6-rc → s6-svscan translated service definitions from /etc/s6-rc/compiled (remember him?) to /run/service (where s6-svscan expects services to be, by default, when spawned from s6-linux-init). However, it won't start any of them.
  • rc.shutdown: uncomment the exec s6-rc -v2 -bDa change. This will bring all of our services down before the system is powered off.
  • runlevel: uncomment exec s6-rc -v2 -up change "$1". When we call a runlevel change through init $RUNLEVEL (or on system initialization, as we'll see later), it will be translated into a call to s6-rc -up change $RUNLEVEL, i.e. start the service/bundle $RUNLEVEL, and stop everything else.
  • rc.shutdown.final can be left alone.
Note
As the comments inside the example files say, s6-linux-init can hook with any service manager. For instance, OpenRC has had some limited support for starting services through s6-svscan for quite some time; once that support matures, it can become a viable alternative to s6-rc in this setup, by just uncommenting different lines in rc.init and rc.shutdown.

Earlier I told you s6-rc-init won't start any services. So how are we supposed to bring up the services on system boot? If you read down rc.init, you'll find:

exec /etc/s6-linux-init/current/scripts/runlevel "$rl"

This is what actually starts our services on boot. If you didn't specify any runlevels on the kernel command line, $rl will be default. Usually, default will be a bundle with the services you want most of the time. If you didn't create a bundle called default, take the opportunity to remove the compiled symlink and perform that set of commands again (the fact that there's this whole compiling dance on s6-rc is one of its drawbacks, unfortunately).

root #cd /etc/s6-rc/src
root #mkdir default
root #printf '%s\n' service1 service2 ... > default/contents
root #echo bundle > default/type
root #cd ..
root #NAME=$(date +%s)
root #s6-rc-compile $NAME src
root #ln -sfT $NAME compiled

I recommend not making a single giant default bundle, but rather work with layers. For instance, boot with just the filesystems, utmp, ttys and the device manager, and then include this boot bundle inside of default, alongside services like CUPS, bluetoothd, elogind, etc.. Adding boot to the command line could then act as a sort of "safe mode", with no potentially misbehaving services, in addition to making init $RUNLEVEL useful.

Trying it out

To make this setup bootable, you need to symlink the contents of /etc/s6-linux-init/current/bin to /sbin. You can use the following script:

root #for prog in /etc/s6-linux-init/current/bin/*; do ln -s "$prog" "/sbin/s6-${prog##*/}"; done

Now, you can put init=/sbin/s6-init on your kernel command line, and use s6-poweroff/s6-reboot to perform power management on your s6 session. If anything goes wrong, go to the emergency tty we set up, or reboot with OpenRC.

Managing s6 within s6

Boot into s6. So far, we've learned how what the database is and how to compile it, and hopefully we have a working one.

We're just getting started.

Bringing services up or down

Just in case you have made this far and haven't read the documentation of s6-rc, here's a cheatsheet:

  • s6-rc -u change service to bring a service or a bundle of services up.
  • s6-rc -d change service to bring a service or a bundle of services down.

s6-rc can also take a -p option, which either means "Stop everything else and bring these up" (-u + -p) or "Stop these and services that depend on it, and bring up everything else" (-d + -p).

Changing /etc/s6-rc/compiled in place

Simply re-linking /etc/s6-rc/compiled when we want to add services, as we did before booting into s6, will bring s6-rc to an inconsistent state where you can't bring services up or down without errors - if you think "waiting for session C2 of user X" on systemd was bad, you haven't accidentally overwritten /etc/s6-rc/compiled on an s6 install.

This doesn't mean we can't change service definitions without rebooting - there's a tool to change, in place, s6-rc from /etc/s6-rc/compiled to somewhere else, dynamically - s6-rc-update.

We first need to compile a database with a unique name (as we've been doing):

root #NAME=$(date +%s)
root #s6-rc-compile /etc/s6-rc/$NAME /etc/s6-rc/src

Now, tell s6-rc to use the new database:

root #s6-rc-update $NAME

Now that s6-rc is looking away, we can safely overwrite compiled.

root #ln -sfT -- $NAME compiled

You'll be doing this a lot, so it's recommended to make it a script. Let's say, /usr/bin/s6-db-reload with the following contents:

NAME=$(date +%s) &&
s6-rc-compile -- "/etc/s6-rc/$NAME" /etc/s6-rc/src &&
s6-rc-update -- "/etc/s6-rc/$NAME" &&
ln -sfT -- "$NAME" /etc/s6-rc/compiled
Note
According to the POSIX standard, ln -s must perform two system calls when overwriting an existing symlink: one to remove the existing link, and other to create the new link. It's an extreme nitpick, but the first system call can succeed, while the second doesn't, or that you press Ctrl+C right between the two, etc., and you're left without /etc/s6-rc/compiled, setting you up for a nasty surprise on the next reboot. You'd have to be very unlucky, but just in case, here's a rewritten version with a workaround, proposed by s6's author (mv is a single system call).
NAME=$(date +%s) &&
s6-rc-compile -- "/etc/s6-rc/$NAME" /etc/s6-rc/src &&
s6-rc-update -- "/etc/s6-rc/$NAME" &&
ln -sf -- "$NAME" /etc/s6-rc/compiled/compiled &&
mv -f /etc/s6-rc/compiled/compiled /etc/s6-rc

Taking full advantage of s6

Readiness notification

Most non-trivial services take a certain amount of time before they're actually ready to perform their duty. This means that if the service manager is fast enough (and s6-rc is fast), dependant services might start before the "dependee" is actually ready to perform its duty. To account for this case, s6 has implemented a simple readiness notification mechanism: daemons write a newline to a pipe (in a location specified in a file called notification-fd), and s6-supervise understands it's ready to communicate with other processes and broadcasts this information to anyone who asks. s6-rc asks and takes it into account when ordering services.

Most programs included in s6 have an option for this, and many other programs have options that, although not even intended for systems using s6, can work just as well for this purpose (such as DBus's --print-address= and Xorg's -displayfd). The latter case -- fitting in places where it wasn't even expected -- is a sign of a well thought out system, in my opinion.

For example, a definition of an s6-log instance with readiness notification, which will be relevant for the next section, might look like.

FILE notification-fd
3
FILE type
longrun
FILE run
#!/bin/execlineb -P
s6-log -d 3 -- /var/log/my-program

Note that the argument for option -d must the same as the content of the file notification-fd, as it should for non-s6 programs either intentionally or accidentally compatible with s6's notifications.

Polling

Another example of how s6's notification protocol is simple yet extensible is the s6-notifyoncheck program. For programs that don't implement a command-line option useful for s6's readiness notification protocol, it implements a polling mechanism that feeds back into it.

Create a folder named data/ inside the service folder and write a script named check, that only exits successfully if the service is up. If the program doesn't have a dedicated utility for pinging, using one to query information from it (think pactl info, xwinfo -root, etc.) with its output redirected to /dev/null should also work.

Once check is written, assuming you chose 3 as the notification-fd, replace

program

in run with

s6-notifyoncheck -d -33 program

Notice two options were given: -d (double-fork, so s6-notifyoncheck doesn't become a "zombie process") and -3 (the location of the notification pipe).

Tip
The data/ and env/ folders are reserved for user input. s6's author has committed to never let the contents of these folders affect s6's operation. It's ideal for helper scripts and configuration files that won't change often.
Warning
Polling isn't ideal for the reasons outlined in /usr/share/doc/s6-*/html/notifywhenup.html. s6-notifyoncheck should be a last resort, reading the daemon's manual critically and creatively should come first.
Even if it turns out to be necessary, s6's readiness notification protocol is so simple that, unless the codebase is extreme spaghetti, it should take at most a few lines to implement it, without adding any dependencies, if the project maintainer is open to the idea.

Logging chain

Syslog under s6-rc is not natively supported. Instead, the preferred mechanism is sending daemon's standard output and error to a second logger daemon, s6-log. Again, runit users will be familiar with this, as it requires a similar design with svlogd in place of s6-log. However, how it is set up is quite different.

Let's say you want to log a daemon called verbosed. Create two service definition folders.

The first one - named verbosed-srv, if you're following Artix's conventions - should be populated it as normal, with run, type, notification-fd, etc, for the daemon itself. The only difference is that you should write exec verbosed 2>&1 (shell) or fdmove -c 2 1 verbosed (execline) in run so error messages go to standard output.

A second folder - conventionally, verbosed-log - is then populated with another service, preferably s6-log writing to a unique location (conventionally, /var/log/verbosed), and with readiness notification.

Now, you can use s6-rc's pipeline mechanism. It can supervise entire equivalents of shell script pipelines, but the one pipeline in particular we want to supervise is verbosed-srv | verbosed-log.

The steps we take are:

  • Write verbosed-log to verbosed-srv/producer-for - so verbosed-srv's standard output gets connect to verbosed-log's standard input.
  • Write verbosed-srv to verbosed-log/consumer-for - this confirms the above in verbosed-log's side.
  • Write verbosed to verbosed-log/pipeline-name - the file that contains the name of the bundle with verbosed-srv | verbosed-log.

After recompiling, your database will contain a verbosed bundle, that will start the service and its logger. If this sounds like the kind of boilerplate you'd want to automate with a script, that's because it is.

#!/bin/sh
SERVICE=${1?:Need service.}

mkdir -- "$SERVICE-srv" "$SERVICE-log" || exit

(
cd -- "$SERVICE-srv"
touch run # write it
echo longrun > type
echo "$SERVICE-log" > producer-for
)
(
cd -- "$SERVICE-log"
printf '%s\n' '#!/bin/execlineb -P' "s6-log -d 3 -- /var/log/$SERVICE" > run
echo 3 > notification-fd
echo longrun > type
echo "$SERVICE-srv" > consumer-for
echo "$SERVICE" > pipeline-name
)

Services without a dedicated logger will have their logs sent to /run/uncaught-logs, and, if -1 was given in the s6-linux-init-maker step, to the console.

Replacing OpenRC

Preferably, do these outside of OpenRC + Sysvinit, so you can shutdown your system without the good (?) ol' Alt+PrintScreen+REISUB.

Removing OpenRC and Sysvinit

Deselect sys-apps/openrc and sys-apps/sysvinit. Gentoo packages usually won't have an explicit OpenRC dependency just because of the init script. They will be kept around after we switch, which will come quite handy when you want to rewrite services for your new init. Eventually, emerge --depclean should agree to remove OpenRC and sysvinit. If you're in a hurry, just unmerge both immediately and deal with the fallout (including updates trying to reinstall them) later.

Reemerging sys-apps/s6-linux-init

Re-emerge sys-apps/s6-linux-init with sysv-utils. Rename your /sbin/s6-init symlink to /sbin/init, or create it now if you haven't.

A note on rewriting init scripts

OpenRC, in most cases, is an absent parent which would rather not deal with children nagging them, so it usually avoids passing arguments that would make daemons be in the foreground, or even actively passes arguments that make programs go to the background.

Like runit (and to be fair, supervise-daemon), however, s6 considers processes to have failed when they exit, and tries to restart then in this case.

This means that, if you just blindly copy-paste the command line used by OpenRC, a conflict might happen: first, s6 spawns the program with OpenRC flags. Then program spawned by s6 will spawn the actual service, and leave s6 in the dust. After that, s6 keeps trying to restart the first program, which will fail due to there already being a PID file, there already being something waiting for commands on the same location, etc.

For instance, if you make a "literal" translation of /etc/init.d/elogind, you'll end up with a file that calls elogind --daemon, which will go to the background and cause s6 to repeatedly start elogind. Just omitting the --daemon option makes it work properly.

On the other hand, programs like, say, bluetoothd, by default have the behavior OpenRC expects, so you'll have to read the manual and look for a "do not fork", "run in foreground", "no detaching/backgrouding", "supervised", "debug" etc. option and apply it to your service definition. For reference, you can look at Artix Linux's s6 implementation or at runit init scripts for the service you're trying to port, which have the same restriction.

As a last resort, if the service is really obnoxious about not being watched by anything (let's say it's called obnoxiusd), you can write s6-fghack obnoxiousd to run. s6-fghack is, well, a hackish program that will try to stay alive for as long as there are processes spawned by obnoxiousd, giving s6-svscan and s6-supervise "a bone to chew", so to speak. However, stopping s6-fghack (which we've tricked s6 into thinking is the service) won't stop the actual service process, which was spawned by the obnoxiousd command and left without a trace.

Therefore, you'll then have to use whatever the author's intended way to stop the service is as the finish script (which is run after the service -- in this case, s6-fghack -- is stopped), be it obnoxiousctl stop, or xargs kill < pidfile.

WIP

More/better service examples (are they even a good thing, or is a separate execline guide warranted)?

Explain even more s6 concepts.