S6 and s6-rc-based init system

An s6 and s6-rc-based init system is Article description::an init system built using components from the [[s6 and s6-rc packages]], following a general design supported by the program from package s6-linux-init. It can be used as alternative to sysvinit + OpenRC, or systemd.

General setup
There are two slightly different variants for the general setup of an s6 and s6-rc based init system, that will be called the version 0.4.x.x model and the version 1.0.x.x model, after the versions of the s6-linux-init package that support these models. This setup is as follows:


 * 1) When the machine boots, all initialization tasks needed to bring it to its stable, normal 'up and running' state, are split into a stage1 init and a a stage2 init. The stage1 init is invoked by the kernel, runs as process 1, and replaces itself with the  program from s6 when its work is done. The stage2 init is invoked by the stage1 init, runs as a child of process 1, blocks until  starts to execute, and exits when its work is done.
 * 2) During most of the machine's uptime,  runs as process 1 with signal diversion turned on, and there is an s6 supervision tree rooted in process 1, that is launched as soon as  starts to execute.
 * 3) A supervised catch-all logger is started as part of the supervision tree. The catch-all logger logs messages sent by supervision tree processes to 's standard output and error, supporting a logging chain arrangement.
 * 4) The stage2 init initializes the s6-rc service manager and starts a subset of the services defined in the compiled service database it was initialized with. Some of these s6-rc-managed services might carry out part of the machine's initialization tasks.
 * 5) While  is running as process 1, services are normally managed using s6-rc tools. If using s6-linux-init-1.0.0.0 or later, some limited form of service management can also be done with the  program, which allows the implementation of sysvinit-like runlevels.
 * 6) The administrator initiates the machine's shutdown sequence by running a certain program.

After this point, the models diverge:


 * In the version 0.4.x.x model, that program sends a signal to process 1.
 * In the version 1.0.x.x model, that program instructs a shutdown daemon to perform the shutdown sequence. The shutdown daemon is part of the supervision tree rooted in process 1, just like the catch-all logger. Process 1 is still able to react to signals just like in the version 0.4.x.x model.

In the version 0.4.x.x model, after receives a signal:


 * 1)  executes an appropriate diverted signal handler as a child process, that performs some of the tasks needed to shut the machine down, and stops all s6-rc-managed services.
 * 2) When the diverted signal handler's work is done, it invokes the  program, which makes  perform its finish procedure, and results in execution of the  file in process 1's scan directory.
 * 3) The  file becomes the stage3 init: it runs as process 1, makes the catch-all logger exit cleanly, if it didn't when the supervision tree was brought down by 's finish procedure, and then performs all remaining tasks needed to shut the machine down.
 * 4) When the stage3 init's work is done, it halts, powers off or reboots the machine, as requested by the administrator.

In the version 1.0.x.x model, the shutdown daemon takes care ot the remainder of the shutdown sequence, including the stopping of all s6-rc-managed services. remains process 1 until the shutdown daemon halts, powers off or reboots the machine. This model more closely resembles the actions of sysvinit's program.

The stage1 init
When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), a stage1 init executes as process 1. Therefore, if the stage1 init is named, for example,, and placed in , to use an s6 and s6-rc-based init system, an  argument can be added to the kernel's command line using the bootloader's available mechanisms (e.g. a  command in some 'Gentoo with s6 + s6-rc' menu entry for GRUB2). It is possible to go back to sysvinit + OpenRC at any time, or to any other init system, by reverting the change.

The stage1 init runs with its standard input, output and error redirected to the machine's console. It must do all necessary setup for to be able to run. This includes setting up its scan directory, and because at that point the root filesystem might be the only mounted filesystem, and possibly read-only, the stage1 init must also mount a read-write filesystem to hold and  control files that need to be written to. The customary setup of an s6 and s6-rc-based init system uses a run image containing the initial scan directory, that is copied to a tmpfs that the stage1 init mounts read-write, normally on. When starts running as process 1, it uses as its scan directory the copy in the tmpfs. The run image can be in a read-only filesystem.

Also, all special files that might be needed by and the stage1 and stage2 inits, such as the  and  device nodes, must be made available by the stage1 init before they are needed.

Because the stage1 init runs as process 1, if it terminates in any way, there will be a kernel panic. Therefore, it must be simple enough and not fail, because recovery in this stage of initialization is almost impossible. So s6 and s6-rc-based init systems split initialization into a stage1 init and a stage2 init. The stage2 init is spawned as a child process by the stage1 init, which, as soon as it finishes its work, replaces itself with using a POSIX   call.

The author of s6 has designed the execline package so that the stage1 init can be an execline script. The general structure of a stage1 execline script is as follows, or a variation thereof:

Where:


 * ${stage1_envdir} is a placeholder for the absolute pathname of an environment directory to be used by the stage1 and stage2 init (e.g. ).
 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the read-write tmpfs will be mounted (normally ).
 * ${run_image} is a placeholder for the absolute pathname of the directory where the run image is stored (e.g. in the rootfs).
 * ${logger_fifo} is a placeholder for the absolute pathname of the catch-all logger's FIFO (e.g. ).
 * ${stage2_init} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage2 init (e.g. ).
 * ${s6_svscan_envdir} is a placeholder for the absolute pathname of an environment directory used to set up the supervision tree's initial environment (e.g. ).
 * ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., making the scan directory's absolute pathname ).

The program from the 0.4.x.x series of s6-linux-init can create a minimal stage1 execline script with the aforementioned structure, and can be used as a basis for writing a custom or more elaborate one, if so desired. s6-linux-init-1.0.0.0 and later provides a program, also named, that can be used as the stage1 init.

The stage2 init
The stage2 init is spawned by the stage1 init as a child process, and is blocked from running until the latter replaces itself with. To achieve this, the child process of the stage1 init opens the catch-all logger's FIFO for writing using the POSIX  call. The call will block until some other process opens the FIFO for reading. The catch-all logger is a supervised process, so it starts executing when does, and opens the FIFO for reading, thereby unblocking the process, which then replaces itself with the stage2 init.

The stage2 init executes with as process 1, and performs all remaining initialization tasks needed to bring the machine to its stable, normal 'up and running' state. It can execute with a few vital supervised long-lived processes already running, started as part of process 1's supervision tree, including the catch-all logger. When the stage2 init finishes its work, it just exits and gets reaped by.

The stage2 init can be, and normally is, an execline or shell script. Gentoo's official repository does not supply any package with a stage2 init for s6 and s6-rc-based init systems. The 0.4.x.x series of the s6-linux-init package contains an example stage2 execline script, it is the file in the package's  subdirectory. s6-linux-init-1.0.0.0 and later installs an example stage2 shell script, also named, in , containing only comments that illustrate how to set up the init system for a variety of rc subsystems.

s6-rc initialization
The s6-rc service manager needs to be initialized, which must be done when is already running. Therefore, initialization is performed by having the stage2 init invoke the program. This program takes the pathname of a compiled service database as an argument (or defaults it to ), as well as the pathname of process 1's scan directory. So a suitable service database must exist and be available at least in a read-only filesystem. This is the boot-time service database. The live state directory must be in a read-write filesystem, and the customary setup of an s6 and s6-rc-based init system has create it in the read-write tmpfs mounted by the stage1 init.

The initial state of all s6-rc services, as set by, is down. So the the stage2 init must also start all atomic services (oneshots and longruns) that are needed to complete the machine's initialization, if any, as well as all longruns that are wanted up at the end of the boot sequence. This is performed by defining a service bundle in the boot-time service database that groups these atomic services, and having the stage2 init start them with an command naming the bundle. This bundle would be the s6-rc counterpart to OpenRC's +  +  runlevels, systemd's  unit, or nosh's  target bundle directory.

If using s6-linux-init-1.0.0.0 or later, 's runlevels can be mapped to s6-rc bundles.

The catch-all logger
In the context of an s6 and s6-rc-based init system, the catch-all logger is a supervised long-lived process that logs messages sent by supervision tree processes to 's standard output and error, normally in an automatically rotated logging directory. In logging chain arrangement, the leaf processes of a supervision tree normally have dedicated loggers that collect and store messages sent to the process' standard output and error in per-service logs. Messages from, processes, logger processes themselves, and leaf processes that exceptionally don't have logger, are printed on process 1's standard output or error, which, at the beginning of the boot sequence, are redirected to the machine's console. It is possible to redirect them later so that the messages are delivered to the catch-all logger, using a setup that involves a FIFO. Only the catch-all logger's standard error remains redirected to the machine's console, as a last resort.

The run image that is copied to the read-write tmpfs mounted by the stage1 init contains 's initial scan directory, with at least a service directory for the catch-all logger already present, so that it is started as soon as begins execution as process 1. The logging directory is owned by the catch-all logger's effective user after dropping privileges, and normally has permissions 2750 (i.e. the output of displays  ). Because it is possible to have a setup where a read-only rootfs is the only filesystem available, the logging directory is also normally placed in the read-write tmpfs mounted by the stage1 init, unless a different read-write filesystem can be guaranteed to exist before starts executing as process 1 (e.g.  is used, but  is guaranteed to be in the rootfs, and either the kernel mounts the rootfs read-write or the stage1 init remounts it read-write, or  is a filesystem mounted read-write by the stage1 init or the initramfs, etc.). If the logging directory is in the aforementioned tmpfs, it must be created with appropriate owner and permissions by the code of the catch-all logger's file, or be present as an empty directory with appropriate owner and permissions in the run image copied to the tmpfs.

The program from s6-linux-init can create a catch-all logger that is a supervised  process. Its service directory is named.

The catch-all logger's FIFO
An s6 and s6-rc-based init system has a FIFO some place in the filesystem, reserved for the catch-all logger. The FIFO is owned by root and has permissions 0600 (i.e. the output of displays  ). The code of the catch-all logger's file opens the FIFO for reading, redirects its standard input to it, its standard error to, drops privileges (e.g. by invoking  or  if it is a script) and calls the logger program, which is normally.

The stage1 init redirects its standard output and error to the catch-all logger's FIFO before replacing itself with. However, opening a FIFO for writing is an operation that blocks until some other process opens it for reading, and a POSIX non-blocking  call fails with an error status if it specifies the 'open for writing only' flag  and there is no reader. The program from the execline package was written in a way that specifically addresses this problem: it is a chain loading program that, if invoked with options ,   and  , will execute the next program in the chain with the specified file descriptor open for writing and without blocking, even if the specified pathname corresponds to a FIFO and there is no reader.

Stopping the catch-all logger
The program supports a   option that makes it ignore the   signal, so that it can't get killed that way. If is being used as the catch-all logger program and, to minimize the risk of losing logs, was invoked with this option, in the version 0.4.x.x model, a special procedure is used to make it exit cleanly while the supervision tree is being brought down by 's finish procedure. When the parent process receives a   signal, it sends  a   signal followed by a   signal. But because doesn't exit until its supervised process does, and  ignores   and keeps running, the  program supports a special option,   (capital 'x'), that works like   (small 'x'), but also makes  redirect its standard input, output and error to.

The stage3 init's code can use an command with the catch-all logger's service directory as the argument; this would leave the catch-all logger's FIFO with no writers, because  and all other  processes would normally have exited by then, causing  to detect end-of-file on its standard input and exit.

In the version 1.0.x.x model, the catch-all logger is never stopped.

Version 0.4.x.x model
In the version 0.4.x.x model, an s6 and s6-rc-based init system is asked to initiate the shutdown sequence by sending signals to process 1. Because the program running as process 1 is with signal diversion turned on, the signals must be chosen from the set it can divert. The BusyBox, and  applets, and the ,  and  programs from the 0.4.x.x series of s6-linux-init, are capable of sending suitable signals to process 1:

When process 1 receives such a signal, the corresponding diverted signal handler is executed as a child process. The handler then performs part of the tasks needed to shut the machine down, and when it finishes its work, it invokes the program with the option that corresponds to the action associated with the handled signal.

Generally speaking, the handlers undo what the stage2 init has done at boot time. Because most of this work is the same for all handlers, they usually execute a common file, named the shutdown file, and wait for it to finish before invoking. The shutdown file's code can use s6 tools and s6-rc services to do its work, because is still running. However, all s6-rc-managed services have to be stopped (normally with a command) before  is invoked, because  will stop running after that, and s6-rc does not work without an s6 supervision tree. The diverted signal handlers and the shutdown file can be, and normally are, execline or shell scripts.

The general structure of an diverted signal handler execline script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., making the scan directory's absolute pathname ).
 * ${shutdown_file} is a placeholder for the name (if PATH search would find it) or absolute pathname of the shutdown file (e.g. ).
 * ${option} is the option for the action that corresponds to the signal:
 * -0 or -st for halt.
 * -7 or -pt for poweroff.
 * -6 or -rt for reboot.

Gentoo's official repository does not supply any package with s6-svscan diverted signal handlers or a shutdown file for s6 and s6-rc-based init systems. Users must create them from scratch or take them from somewhere else (e.g. alternative ebuild repositories). The program from the 0.4.x.x series of s6-linux-init can create signal handler execline scripts for all  diverted signals, compatible with,  and. They can work without modifications for BusyBox, and , by swapping the   and   handlers. This package also contains an example shutdown execline script, it is the file in the package's  subdirectory.

Version 1.0.x.x model
In the version 1.0.x.x model, it is the supervised shutdown daemon's task to perform the shutdown sequence. The program running as process 1 is still with signal diversion turned on, so diverted signal handlers still define what happens when process 1 receives a signal. However, their only task is forwarding requests to the shutdown daemon.

The program from s6-linux-init-1.0.0.0 and later creates signal handler execline scripts for all  diverted signals, that either invoke the  program from the same package to request that the machine be halted, powered off or rebooted, or do nothing.

The stage3 init
In the version 0.4.x.x model, when an diverted signal handler invokes the  program,  performs its finish procedure, executing the  file in the  control subdirectory of its scan directory, using the POSIX   call, and passing a ,   or   argument to it. Therefore, it replaces as process 1 and becomes the stage3 init.

The stage3 init redirects its standard output and error to, uses the command to make the catch-all logger exit cleanly, and performs all remaining tasks needed to shut the machine down. It must also kill all other processes that are still running at that point, after a grace period to allow them to exit on their own, so that filesystems can be synced and unmounted, or remounted read-only. This can be done with a POSIX  call specifying -1 as the process ID argument, usually to send a   signal followed by a   signal first, waiting for a short period of time, and then sending a   signal. Because the stage3 init runs as process 1, and process 1 does not get killed by a  call, it keeps running after that. Sending a  signal to all processes from a non-PID 1 process that is expected to continue running is much harder. The stage3 init can be, and normally is, an execline or shell script. The program provided by either the GNU Core Utilities package, the util-linux package  or the procps package , can be used in such a script as ,  and  (the last form might also kill itself, but not the stage3 init). The program from the s6-portable-utils  package can also be used in such a script, as  (  +  ) and. And a stage3 shell script that invokes a shell with a builtin utility works too. In that case, process 1 will be a shell process that sends the signals itself. The command can be used in a stage3 execline script to reap all resulting zombie processes.

When the stage3 init finishes its work, it performs the halt, poweroff or reboot operation, as specified by the argument supplied by. If it is a script, it can use the BusyBox, and  applets for that, or the ,  and  programs from the 0.4.x.x series of s6-linux-init, passing them an   (force) option.

The general structure of an execline stage3 script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., making the scan directory's absolute pathname ).
 * ${logger_servicedir} is a placeholder for the name of the catch-all logger's service directory (e.g., making the service directory's absolute pathname ).

The program from the 0.4.x.x series of s6-linux-init can create an execline stage3 script with the aforementioned structure, and can be used as a basis for writing a custom or more elaborate one, if so desired.

The shutdown daemon
In the version 1.0.x.x model, the supervision tree includes a shutdown daemon, that receives requests to initiate the shutdown sequence, either immediately or after a certain specified time elapses. It covers the functionaly that the s6-svscan diverted signal handlers and the stage3 init have in the version 0.4.x.x model.

First, the shutdown daemon executes a shutdown file, and waits for it to finish. Generally speaking, the shutdown file undoes what the stage2 init has done at boot time. In particular, if s6-rc is in use, it can be used to stop all s6-rc-managed services (normally with a command). The shutdown file can be, and normally is, an execline or shell script. Its code can use s6 tools and s6-rc services to do its work.

Then, the shutdown daemon redirects its standard output and error to, stops all processes from the supervision tree except the catch-all logger, and kills all other processes that are still running at that point, so that filesystems can be synced and unmounted, or remounted read-only. It does so by sending all processes a  signal followed by a   signal first, with a POSIX   call that specifies -1 as the process ID argument, and then, after a grace period to allow them to exit on their own, by sending a   signal in the same way. Because they are supervised, and the supervision tree is not destroyed because is running as process 1, and process 1 does not get killed by a   call, the catch all logger is restarted after getting killed, and so is the shutdown daemon, which exits after sending the signal, if it didn't get killed by it.

Finally, after getting restarted, the shutdown daemon unmounts all mounted filesystems and performs the halt, poweroff or reboot operation, as requested.

s6-linux-init-1.0.0.0 and later provides a shutdown daemon named, and a program named , to be used by the administrator to shut the machine down, that forwards the administrator's request to. Gentoo's official repository does not supply any package with a shutdown file for s6 and s6-rc-based init systems. Users must create them from scratch or take them from somewhere else (e.g. alternative ebuild repositories). s6-linux-init installs a example shutdown shell script named in, containing only comments that illustrate how to set up machine shutdown for a variety of rc subsystems.

Compatibility scripts
is not directly compatible with sysvinit's, , , , and commands. However, many programs (e.g. those from desktop environments) expect to be able to call programs with those names during operation, so if such thing is needed, it is possible to use compatibility execline scripts:

The program from s6-linux-init-1.0.0.0 and later creates execline compatibility scripts for sysvinit's, , , ,  and  programs, that invoke the , ,  and  programs from the same package.

Service management
On an s6 and s6-rc-based init system, the s6-rc package is used for service management. In particular, the administrator can replace the init system's compiled service database with a new one using the program, and can create a new boot-time service database, to be used next time the machine boots, with the  program and a set of service definitions in the program's supported source format. It is best to have the invocation in the stage2 init use a symbolic link as the compiled service database pathname, so that the boot-time database can be changed by modifying the symlink instead of the stage2 init code, e.g. by having an  directory for storing one or more compiled databases, making  a symbolic link to one of those databases, and using the symlink in the  invocation.

It is possible to have long-lived processes not managed by s6-rc but supervised by process 1, by directly managing s6 service directories, placing them (or symbolic links to them) in process 1's scan directory, and using, or  commands as needed. It is also possible to use as process 1 and just s6 tools, without s6-rc, but then the init system becomes more like runit. In that case, executing with signal diversion turned on is not necessary.

s6 service directories and s6-rc service definitions for anything not supplied in packages from Gentoo's official repository must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

External resources

 * lh-bootstrap, a set of scripts that build a disk image for a virtual machine such as QEMU. The image contains a Linux kernel and a collection of small user-space tools such as BusyBox and dropbear, all statically linked to musl , and an s6 and s6-rc-based init system.
 * Obarun, an Arch derivative with an s6 and s6-rc-based init system.
 * Slew, a project that provides stage1, stage2, stage3 inits and s6-svscan diverted signal handlers, as well as s6-rc service definition directories in 's source format for several services and other supporting scripts, to make an s6 and s6-rc-based init system. Most scripts require Byron Rakitzis's implementation of the Plan 9 shell,, for Unix.