S6 and s6-rc-based init system

An s6 and s6-rc-based init system is Article description::an init system built using components from the [[s6 and s6-rc packages]], following a general design supported by the program from package s6-linux-init. It can be used as alternative to sysvinit + OpenRC, or systemd.

General setup
The general setup of an s6 and s6-rc based init system is as follows:


 * 1) When the machine boots, all initialization tasks needed to bring it to its stable, normal 'up and running' state, are split into a stage1 init and a a stage2 init. The stage1 init runs as process 1, and replaces itself with the  program from s6 when its work is done. The stage2 init runs as a child of process 1, blocks until  starts to execute, and exits when its work is done.
 * 2) During most of the machine's uptime,  runs as process 1 with signal diversion turned on, and there is an s6 supervision tree rooted in process 1, that is launched as soon as  starts to execute.
 * 3) A supervised catch-all logger is launched as part of the supervision tree. The catch-all logger logs messages sent by supervision tree processes to 's standard output and error.
 * 4) The stage2 init initializes the s6-rc service manager and starts a subset of the services defined in its compiled service database. Some of them might carry out part of the machine's initialization tasks.
 * 5) While  is running as process 1, services are normally managed using s6-rc tools.
 * 6) When the administrator wants to initiate the machine's shutdown sequence, a signal is sent to process 1. The BusyBox ,  and  applets, or the ,  and  programs from s6-linux-init, can be used for this.
 * 7)  then executes an appropriate diverted signal handler as a child process, which in turn executes a stage2_finish program that performs some of the tasks needed to shut the machine down, and stops all s6-rc-managed services.
 * 8) When the stage2_finish program exits, the  diverted signal handler invokes the  program, which makes  perform its finish procedure, and results in execution of the  file in process 1's scan directory.
 * 9) The  process makes the catch-all logger exit cleanly, if it didn't when the supervision tree was brought down by 's finish procedure, and then replaces itself with a stage3 init.
 * 10) The stage3 init runs as process 1 and performs all remaining tasks needed to shut the machine down.
 * 11) When the stage3 init's work is done, it halts, powers off or reboots the machine as requested.

The stage1 init
When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), a stage1 init executes as process 1. Therefore, if the stage1 init is named, for example,, and placed in , to use an s6 and s6-rc-based init system, an  argument can be added to the kernel's command line using the bootloader's available mechanisms (e.g. a  command in some 'Gentoo with s6 + s6-rc' menu entry for GRUB2). It is possible to go back to sysvinit + OpenRC at any time, or to any other init system, by reverting the change.

The stage1 init runs with its standard input, output and error redirected to the machine's console. It must do all necessary setup for to be able to run. This includes setting up its scan directory, and because at that point the root filesystem might be the only mounted filesystem, and possibly read-only, the stage1 init must also mount a read-write filesystem to hold and  control files that need to be written to. The customary setup of an s6 and s6-rc-based init system uses a run image containing the initial scan directory, that is copied to a tmpfs that the stage1 init mounts read-write, normally on. When starts running as process 1, it uses as its scan directory the copy in the tmpfs. The run image can be in a read-only filesystem.

Also, all special files that might be needed by and the stage1 and stage2 inits, such as the  and  device nodes, must be made available by the stage1 init before they are needed. Because of this and requirements of programs and libc functions that might be used for machine initialization, the Linux and  filesystems will likely have to be mounted by the stage1 init.

Because the stage1 init runs as process 1, if it exits or is killed, there will be a kernel panic and the machine will hang. Therefore, it must be simple enough and not fail, because recovery in this stage of initialization is almost impossible. So s6 and s6-rc-based init systems split initialization into a stage1 init and a stage2 init. The stage2 init is spawned as a child process by the stage1 init, which, as soon as it finishes its work, replaces itself with using a POSIX   call.

The author of s6 has designed the execline package so that the stage1 init can be an execline script. The general structure of an execline stage1 script is as follows, or a variation thereof:

Where:


 * ${stage1_envdir} is a placeholder for the absolute pathname of an environment directory to be used by the stage1 and stage2 init (e.g. ).
 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the read-write tmpfs will be mounted (normally ).
 * ${run_image} is a placeholder for the absolute pathname of the directory where the run image is stored (e.g. in the rootfs).
 * ${logger_fifo} is a placeholder for the absolute pathname of the catch-all logger's FIFO (e.g. ).
 * ${stage2_init} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage2 init (e.g. ).
 * ${s6_svscan_envdir} is a placeholder for the absolute pathname of an environment directory used to set up the supervision tree's initial environment (e.g. ).
 * ${scandir_relpath} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., if the absolute pathname is ).

Gentoo's official repository does not supply any package with a stage1 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create a minimal execline stage1 script with the aforementioned structure, that uses programs from packages s6-portable-utils  and s6-linux-utils, and can be used as a basis for writing a custom or more elaborate one, if so desired. The scan directory set up by the stage1 script is named  (so by default, its absolute pathname would be ), and all additional initialization the script does is optionally mounting a devtmpfs on, and optionally dumping the kernel's enviroment in an environment directory using  from s6-portable-utils.

The stage2 init
The stage2 init is spawned by the stage1 init as a child process, and is blocked from running until the latter replaces itself with. To achieve this, the child process of the stage1 init opens the catch-all logger's FIFO for writing using the POSIX  call. The call will block until some other process opens the FIFO for reading. The catch-all logger is a supervised process, so it starts executing when does, and opens the FIFO for reading, thereby unblocking the process, which then replaces itself with the stage2 init.

The stage2 init executes with as process 1, and performs all remaining initialization tasks needed to bring the machine to its stable, normal 'up and running' state. It can execute with a few vital supervised long-lived processes already running, started as part of process 1's supervision tree, including the catch-all logger. Part of the remaining initialization is creating the s6-rc service manager's live state directory using the program, which can't be done until  is running. This program takes the pathname of a compiled service database as an argument (or defaults it to ), as well as the pathname of process 1's scan directory. So a suitable services database must exist and be available at least in a read-only filesystem. This is the boot-time service database. The live state directory must be in a read-write filesystem, and the customary setup of an s6 and s6-rc-based init system has create it in the read-write tmpfs mounted by the stage1 init.

also copies to the live state directory all s6-rc longruns' compiled s6 service directories, creates symbolic links to them in process 1's scan directory, and uses an equivalent of the s6-svscanctl -a command to trigger a scan. The scan makes process 1 spawn an child for each longrun, but because  produces s6 service directories that contain a  file, the longrun doesn't execute yet.

The initial state of all s6-rc services, as set by, is 'down'. So the the stage2 init must also start all atomic services (oneshots and longruns) that are needed to complete the machine's initialization, if any, and the longruns that are wanted up at the end of the boot sequence. This is performed by defining a service bundle in the boot-time service database that groups these atomic services, and having the stage2 init start them with an s6-rc -u change command naming the bundle. This bundle would be the s6-rc counterpart to OpenRC's default runlevel, systemd's unit, or nosh's  target bundle directory.

When the stage2 init finishes its work, it exits and gets reaped by. The stage2 init can be, and normally is, an execline or shell script. Gentoo's official repository does not supply any package with a stage2 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init package contains an example execline stage2 script, it is the file in the package's  subdirectory.

The catch-all logger
In the context of an s6 and s6-rc-based init system, the catch-all logger is a supervised long-lived process that logs messages sent by supervision tree processes to 's standard output and error, normally in an automatically rotated logging directory. In a logging chain arrangement, the leaf processes of a supervision tree normally have dedicated loggers that collect and store messages sent to the process' standard output and error in per-service logs. Messages from, processes, logger processes themselves, and leaf processes that exceptionally don't have logger, are printed on process 1's standard output or error, which, at the beginning of the boot sequence, are redirected to the machine's console. It is possible to redirect them later so that the messages are delivered to the catch-all logger, using a setup that involves a FIFO. Only the catch-all logger's standard error remains redirected to the machine's console, as a last resort.

An s6 and s6-rc-based init system has a FIFO some place in the filesystem, reserved for the catch-all logger. The FIFO is owned by root and has permissions 0600 (i.e. the output of ls -l displays ). The run image that is copied to the read-write tmpfs mounted by the stage1 init contains 's initial scan directory, with at least a service directory for the catch-all logger already present, and possibly an additional service directory for an process or similar also present. The former, so that the catch-all logger is launched as soon as starts executing as process 1, and the latter, so that it is possible to log in to the machine if the supervision tree starts successfully, even if something else fails (e.g. s6-rc's setup). The code of the catch-all logger's file opens the FIFO for reading, redirects its standard input to it, its standard error to, drops privileges (e.g. by invoking  or  if it is a script) and calls the logger program, which is normally. The logging directory is owned by the logger's effective user after dropping privileges, and normally has permissions 2700 (i.e. the output of ls -l displays ). Because it is possible to have a setup where a read-only rootfs is the only filesystem available, the logging directory is also normally placed in the read-write tmpfs mounted by the stage1 init, unless a different read-write filesystem can be guaranteed to exist before starts executing as process 1 (e.g.  is used, but  is guaranteed to be in the rootfs, and either the kernel mounts the rootfs read-write or the stage1 init remounts it read-write, or  is a filesystem mounted read-write by the stage1 init or the initramfs, etc.). If the logging directory is in the aforementioned tmpfs, it must be created with appropriate owner and permissions by the code of the catch-all logger's file, or be present as an empty directory with appropriate owner and permissions in the run image copied to the tmpfs.

The stage1 init redirects its standard output and error to the catch-all logger's FIFO before replacing itself with. However, opening a FIFO for writing is an operation that blocks until some other process opens it for reading, and a POSIX non-blocking  call fails with an error status if it specifies the 'open for writing only' flag  and there is no reader. Execline's program was written in a way that specifically addresses this problem: it is a chain loading program that, if invoked with options ,   and  , will execute the next program in the chain with the specified file descriptor open for writing and without blocking, even if the specified pathname corresponds to a FIFO and there is no reader.

The program supports a   option that makes it ignore the   signal, so that it can't get killed that way. If is being used as the catch-all logger program and, to minimize the risk of losing logs, was invoked with this option, a special procedure is used by the code of process 1's  file to make it exit cleanly. When the parent process receives a   signal while the supervision tree is being brought down by 's finish procedure, it sends  a   signal followed by a   signal. But because doesn't exit until its supervised process does, and  ignores   and keeps running, the  program supports a special option,   (capital 'x'), that works like   (small 'x'), but also makes  redirect its standard input, output and error to. The code of process 1's file uses an s6-svc -X command with the catch-all logger's service directory as the argument, so that when  runs, this would leave the catch-all logger's FIFO with no writers, because  and all other  processes would normally have exited by then, causing  to detect end-of-file on its standard input and exit.

Gentoo's official repository does not supply any package with a catch-all logger service directory for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create a catch-all logger service directory named, that can be used as a basis for writing a custom or more elaborate one, if so desired. The catch-all logger uses  with the   option, and logs to a subdirectory named  of the tmpfs mounted by the  stage1 script. The logger's FIFO is named and is located in its service directory.

Signals and the stage2_finish program
An s6 and s6-rc-based init system is asked to initiate the shutdown sequence by sending signals to process 1. Because the program running as process 1 is with signal diversion turned on, the signals must be chosen from the set it can divert. The BusyBox, and  applets, and the ,  and  programs from s6-linux-init, are capable of sending suitable signals to process 1:

When process 1 receives such a signal, the corresponding diverted signal handler is executed as a child process. The handler then calls a stage2_finish program that performs part of the tasks needed to shut the machine down. Generally speaking, the stage2_finish program undoes what the stage2 init has done at boot time. This part of the machine's shutdown sequence can be carried out by s6-rc services and can use s6 tools, since is still running. However, all s6-rc-managed services have to be stopped (normally with a s6-rc -da change command) before the stage2_finish program exits, because will stop running after it does, and s6-rc does not work without an s6 supervision tree. The stage2_finish program can be, and normally is, an execline or shell script.

The general structure of an execline diverted signal handler script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir_relpath} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., if the absolute pathname is ).
 * ${stage2_finish} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage2_finish program (e.g. ).
 * ${option} is the option for the operation corresponding to the signal:
 * -0 or -st for halt.
 * -7 or -pt for poweroff.
 * -6 or -rt for reboot.

The program from s6-linux-init can create execline handler scripts for all  diverted signals, compatible with,  and. They can currently work without modifications for BusyBox, and , by swapping the   and   handlers.

Gentoo's official repository does not supply any package with a stage2_finish program for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init package contains an example execline stage2_finish script, it is the file in the package's  subdirectory.

This means that is not directly compatible with sysvinit's, , , , and  commands. However, many programs (e.g. those from desktop environments) expect to be able to call programs with those names during operation, so if such thing is needed, it is possible to use compatibility execline scripts:

The stage3 init
When the stage2_finish exits, the diverted signal handler that invoked it then calls the  program with an appropriate option to make  perform its finish procedure. executes the file in the  control subdirectory of its scan directory, using the POSIX   call, passing a halt, poweroff or reboot argument to it. executes as process 1, redirects its standard output and error to, uses the s6-svc -X command to make the catch-all logger exit cleanly, and replaces itself with a stage3 init, again using a POSIX  call, passing along the argument supplied by. Alternatively, the stage3 init code might be part of the file, in which case that file would be considered the stage3 init.

The general structure of a process 1 execline script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir_relpath} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., if the absolute pathname is ).
 * ${logger_servicedir} is a placeholder for the name of the catch-all logger's service directory (e.g., if the absolute pathname is ).
 * ${stage3_init} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage3 init (e.g. ).

The program from s6-linux-init can create a suitable process 1 execline  script.

The stage3 init runs as process 1 to perform all remaining tasks needed to shut the machine down. It must also kill all other processes that are still running at that point, after a grace period to allow them to exit on their own, so that filesystems can be synced and unmounted, or remounted read-only. This can be done with a POSIX  call specifying -1 as the process ID argument, usually to send a   signal followed by a   signal first, waiting for a short period of time, and then sending a   signal. Because the stage3 init runs as process 1, and process 1 does not get killed by a  call, it continues executing after that. Sending a  signal to all processes from a non-PID 1 process that is expected to continue running is much harder. The stage3 init can be, and normally is, an execline or shell script. The program provided by either the GNU Core Utilities package, the util-linux package  or the procps package , can be used in such a script as kill -TERM -1 , kill -CONT -1 and kill -KILL -1 (the last form will also kill itself, but not the stage3 init). The program from the s6-portable-utils package can also be used in such a script, as s6-nuke -t (  +  ) and s6-nuke -k. And a shell stage3 script that invokes a shell with a builtin utility works too. In that case, process 1 will be a shell process that sends the signals itself. A wait -r {} command can be used in an execline stage3 script to reap all resulting zombie processes.

When the stage3 init finishes its work, it performs the halt, poweroff or reboot operation with a Linux  call. If it is a script, it can use the BusyBox, and  applets, or the ,  and  programs from s6-linux-init, passing them an   (force) option and the argument supplied by :

Gentoo's official repository does not supply any package with a stage3 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init package contains an example execline stage3 script, it is the file in the package's  subdirectory.

Service management
On an s6 and s6-rc-based init system, the s6-rc package is used for service management. In particular, the administrator can replace the init system's compiled service database with a new one using the program, and can create a new boot-time service database, to be used next time the machine boots, with the  program and a set of service definitions in the program's supported source format. It is best to have the invocation in the stage2 init use a symbolic link as the compiled service database pathname, so that the boot-time database can be changed by modifying the symlink instead of the stage2 init code, e.g. by having an  directory for storing one or more compiled databases, making  a symbolic link to one of those databases, and using the symlink in the  invocation.

It is possible to have long-lived processes not managed by s6-rc but supervised by process 1, by directly managing s6 service directories, placing them (or symbolic links to them) in process 1's scan directory, and using s6-svscanctl -a, s6-svscanctl -n or s6-svscanctl -N commands as needed. It is also possible to use as process 1 and just s6 tools, without s6-rc, but then the init system becomes more like runit. In that case, executing with signal diversion turned on is not necessary.

s6 service directories and s6-rc service definitions for anything not supplied in packages from Gentoo's official repository must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

External resources

 * lh-bootstrap, a set of scripts that build a disk image for a virtual machine such as QEMU. The image contains a Linux kernel and a collection of small user-space tools such as BusyBox and dropbear, all statically linked to musl , and an s6 and s6-rc based init system.
 * Obarun, an Arch derivative with an s6 and s6-rc based init system.
 * Slew, a project that provides stage1, stage2, stage3 inits and s6-svscan diverted signal handlers, as well as s6-rc service definition directories in 's source format for several services and other supporting scripts, to make an s6 and s6-rc based init system. Most scripts require Byron Rakitzis's implementation of the Plan 9 shell,, for Unix.