S6 and s6-rc-based init system

An s6 and s6-rc-based init system is Article description::an init system built using components from the [[s6 and s6-rc packages]], following a general design supported by the program from package s6-linux-init. It can be used as alternative to sysvinit + OpenRC, or systemd.

General setup
The general setup of an s6 and s6-rc based init system is as follows:


 * 1) When the machine boots, all initialization tasks needed to bring it to its stable, normal 'up and running' state, are split into a stage1 init and a a stage2 init. The stage1 init is invoked by the kernel, runs as process 1, and replaces itself with the  program from s6 when its work is done. The stage2 init is invoked by the stage1 init, runs as a child of process 1, blocks until  starts to execute, and exits when its work is done.
 * 2) During most of the machine's uptime,  runs as process 1 with signal diversion turned on, and there is an s6 supervision tree rooted in process 1, that is launched as soon as  starts to execute.
 * 3) A supervised catch-all logger is started as part of the supervision tree. The catch-all logger logs messages sent by supervision tree processes to 's standard output and error.
 * 4) The stage2 init initializes the s6-rc service manager and starts a subset of the services defined in the compiled service database it was initialized with. Some of these s6-rc-managed services might carry out part of the machine's initialization tasks.
 * 5) While  is running as process 1, services are normally managed using s6-rc tools.
 * 6) The administrator initiates the machine's shutdown sequence by running a program that sends a signal to process 1. The BusyBox ,  and  applets, or the ,  and  programs from s6-linux-init, can be used for this.
 * 7)  then executes an appropriate diverted signal handler as a child process, that performs some of the tasks needed to shut the machine down, and stops all s6-rc-managed services.
 * 8) When the diverted signal handler's work is done, it invokes the  program, which makes  perform its finish procedure, and results in execution of the  file in process 1's scan directory.
 * 9) The  becomes the stage3 init: it runs as process 1, makes the catch-all logger exit cleanly, if it didn't when the supervision tree was brought down by 's finish procedure, and then performs all remaining tasks needed to shut the machine down.
 * 10) When the stage3 init's work is done, it halts, powers off or reboots the machine, as requested by the administrator.

The stage1 init
When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), a stage1 init executes as process 1. Therefore, if the stage1 init is named, for example,, and placed in , to use an s6 and s6-rc-based init system, an  argument can be added to the kernel's command line using the bootloader's available mechanisms (e.g. a  command in some 'Gentoo with s6 + s6-rc' menu entry for GRUB2). It is possible to go back to sysvinit + OpenRC at any time, or to any other init system, by reverting the change.

The stage1 init runs with its standard input, output and error redirected to the machine's console. It must do all necessary setup for to be able to run. This includes setting up its scan directory, and because at that point the root filesystem might be the only mounted filesystem, and possibly read-only, the stage1 init must also mount a read-write filesystem to hold and  control files that need to be written to. The customary setup of an s6 and s6-rc-based init system uses a run image containing the initial scan directory, that is copied to a tmpfs that the stage1 init mounts read-write, normally on. When starts running as process 1, it uses as its scan directory the copy in the tmpfs. The run image can be in a read-only filesystem.

Also, all special files that might be needed by and the stage1 and stage2 inits, such as the  and  device nodes, must be made available by the stage1 init before they are needed. Because of this and requirements of programs and libc functions that might be used for machine initialization, the Linux and  filesystems will likely have to be mounted by the stage1 init.

Because the stage1 init runs as process 1, if it exits or is killed, there will be a kernel panic and the machine will hang. Therefore, it must be simple enough and not fail, because recovery in this stage of initialization is almost impossible. So s6 and s6-rc-based init systems split initialization into a stage1 init and a stage2 init. The stage2 init is spawned as a child process by the stage1 init, which, as soon as it finishes its work, replaces itself with using a POSIX   call.

The author of s6 has designed the execline package so that the stage1 init can be an execline script. The general structure of an execline stage1 script is as follows, or a variation thereof:

Where:


 * ${stage1_envdir} is a placeholder for the absolute pathname of an environment directory to be used by the stage1 and stage2 init (e.g. ).
 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the read-write tmpfs will be mounted (normally ).
 * ${run_image} is a placeholder for the absolute pathname of the directory where the run image is stored (e.g. in the rootfs).
 * ${logger_fifo} is a placeholder for the absolute pathname of the catch-all logger's FIFO (e.g. ).
 * ${stage2_init} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage2 init (e.g. ).
 * ${s6_svscan_envdir} is a placeholder for the absolute pathname of an environment directory used to set up the supervision tree's initial environment (e.g. ).
 * ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., making the scan directory's absolute pathname ).

Gentoo's official repository does not supply any package with a stage1 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create a minimal execline stage1 script with the aforementioned structure, that uses programs from packages s6-portable-utils  and s6-linux-utils, and can be used as a basis for writing a custom or more elaborate one, if so desired. The scan directory set up by the stage1 script is named  (so by default, its absolute pathname would be ), and all additional initialization the script does is optionally mounting a devtmpfs on, and optionally dumping the kernel's enviroment in an environment directory using  from s6-portable-utils.

The stage2 init
The stage2 init is spawned by the stage1 init as a child process, and is blocked from running until the latter replaces itself with. To achieve this, the child process of the stage1 init opens the catch-all logger's FIFO for writing using the POSIX  call. The call will block until some other process opens the FIFO for reading. The catch-all logger is a supervised process, so it starts executing when does, and opens the FIFO for reading, thereby unblocking the process, which then replaces itself with the stage2 init.

The stage2 init executes with as process 1, and performs all remaining initialization tasks needed to bring the machine to its stable, normal 'up and running' state. It can execute with a few vital supervised long-lived processes already running, started as part of process 1's supervision tree, including the catch-all logger.

When the stage2 init finishes its work, it just exits and gets reaped by. The stage2 init can be, and normally is, an execline or shell script. Gentoo's official repository does not supply any package with a stage2 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init package contains an example execline stage2 script, it is the file in the package's  subdirectory.

s6-rc initialization
The s6-rc service manager needs to be initialized, which must be done when is already running. Therefore, initialization is performed by having the stage2 init invoke the program. This program takes the pathname of a compiled service database as an argument (or defaults it to ), as well as the pathname of process 1's scan directory. So a suitable service database must exist and be available at least in a read-only filesystem. This is the boot-time service database. The live state directory must be in a read-write filesystem, and the customary setup of an s6 and s6-rc-based init system has create it in the read-write tmpfs mounted by the stage1 init.

The initial state of all s6-rc services, as set by, is down. So the the stage2 init must also start all atomic services (oneshots and longruns) that are needed to complete the machine's initialization, if any, as well as all longruns that are wanted up at the end of the boot sequence. This is performed by defining a service bundle in the boot-time service database that groups these atomic services, and having the stage2 init start them with an s6-rc -u change command naming the bundle. This bundle would be the s6-rc counterpart to OpenRC's default runlevel, systemd's unit, or nosh's  target bundle directory.

The catch-all logger
In the context of an s6 and s6-rc-based init system, the catch-all logger is a supervised long-lived process that logs messages sent by supervision tree processes to 's standard output and error, normally in an automatically rotated logging directory. In a logging chain arrangement, the leaf processes of a supervision tree normally have dedicated loggers that collect and store messages sent to the process' standard output and error in per-service logs. Messages from, processes, logger processes themselves, and leaf processes that exceptionally don't have logger, are printed on process 1's standard output or error, which, at the beginning of the boot sequence, are redirected to the machine's console. It is possible to redirect them later so that the messages are delivered to the catch-all logger, using a setup that involves a FIFO. Only the catch-all logger's standard error remains redirected to the machine's console, as a last resort.

The logging directory is owned by the catch-all logger's effective user after dropping privileges, and normally has permissions 2700 (i.e. the output of ls -l displays ). Because it is possible to have a setup where a read-only rootfs is the only filesystem available, the logging directory is also normally placed in the read-write tmpfs mounted by the stage1 init, unless a different read-write filesystem can be guaranteed to exist before starts executing as process 1 (e.g.  is used, but  is guaranteed to be in the rootfs, and either the kernel mounts the rootfs read-write or the stage1 init remounts it read-write, or  is a filesystem mounted read-write by the stage1 init or the initramfs, etc.). If the logging directory is in the aforementioned tmpfs, it must be created with appropriate owner and permissions by the code of the catch-all logger's file, or be present as an empty directory with appropriate owner and permissions in the run image copied to the tmpfs.

Gentoo's official repository does not supply any package with a catch-all logger service directory for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create a catch-all logger service directory named, that can be used as a basis for writing a custom or more elaborate one, if so desired. The catch-all logger uses  with the   option, and logs to a subdirectory named  of the tmpfs mounted by the  stage1 script. The logger's FIFO is named and is located in its service directory.

The catch-all logger's FIFO
An s6 and s6-rc-based init system has a FIFO some place in the filesystem, reserved for the catch-all logger. The FIFO is owned by root and has permissions 0600 (i.e. the output of ls -l displays ). The run image that is copied to the read-write tmpfs mounted by the stage1 init contains 's initial scan directory, with at least a service directory for the catch-all logger already present, and possibly an additional service directory for an process or similar also present. The former, so that the catch-all logger is started as soon as begins execution as process 1, and the latter, so that it is possible to log in to the machine if the supervision tree starts successfully, even if something else fails (e.g. s6-rc's setup). The code of the catch-all logger's file opens the FIFO for reading, redirects its standard input to it, its standard error to, drops privileges (e.g. by invoking  or  if it is a script) and calls the logger program, which is normally.

The stage1 init redirects its standard output and error to the catch-all logger's FIFO before replacing itself with. However, opening a FIFO for writing is an operation that blocks until some other process opens it for reading, and a POSIX non-blocking  call fails with an error status if it specifies the 'open for writing only' flag  and there is no reader. Execline's program was written in a way that specifically addresses this problem: it is a chain loading program that, if invoked with options ,   and  , will execute the next program in the chain with the specified file descriptor open for writing and without blocking, even if the specified pathname corresponds to a FIFO and there is no reader.

The catch-all logger's FIFO is named  and is located in the logger's service directory, i.e. its pathname, relative to the tmpfs' mount point, is.

Stopping the catch-all logger
The program supports a   option that makes it ignore the   signal, so that it can't get killed that way. If is being used as the catch-all logger program and, to minimize the risk of losing logs, was invoked with this option, a special procedure is used by the code of process 1's  file to make it exit cleanly. When the parent process receives a   signal while the supervision tree is being brought down by 's finish procedure, it sends  a   signal followed by a   signal. But because doesn't exit until its supervised process does, and  ignores   and keeps running, the  program supports a special option,   (capital 'x'), that works like   (small 'x'), but also makes  redirect its standard input, output and error to.

The stage3 init's code can use an s6-svc -X command with the catch-all logger's service directory as the argument; this would leave the catch-all logger's FIFO with no writers, because and all other  processes would normally have exited by then, causing  to detect end-of-file on its standard input and exit.

The s6-svscan diverted signal handlers
An s6 and s6-rc-based init system is asked to initiate the shutdown sequence by sending signals to process 1. Because the program running as process 1 is with signal diversion turned on, the signals must be chosen from the set it can divert. The BusyBox, and  applets, and the ,  and  programs from s6-linux-init, are capable of sending suitable signals to process 1:

When process 1 receives such a signal, the corresponding diverted signal handler is executed as a child process. The handler then performs part of the tasks needed to shut the machine down, and when it finishes its work, it invokes the program with the option that corresponds to the action associated with the corresponding signal.

Generally speaking, the handlers undo what the stage2 init has done at boot time. Because most of this work is the same for all diverted signal handlers, they usually execute a common file, named the shutdown file, and wait for it to finish before invoking. The shutdown file's code can use s6 tools and s6-rc services to do its work, because is still running. However, all s6-rc-managed services have to be stopped (normally with a s6-rc -da change command) before is invoked, because  will stop running after that, and s6-rc does not work without an s6 supervision tree. The diverted signal handlers and the shutdown file can be, and normally are, execline or shell scripts.

The general structure of an execline handler script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir_relpath} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., if the absolute pathname is ).
 * ${shutdown_file} is a placeholder for the name (if PATH search would find it) or absolute pathname of the shutdown file (e.g. ).
 * ${option} is the option for the action that corresponds to the signal:
 * -0 or -st for halt.
 * -7 or -pt for poweroff.
 * -6 or -rt for reboot.

Gentoo's official repository does not supply any package with diverted signal handlers or a shutdown file for s6 and s6-rc-based init systems. Users must create them from scratch or take them from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create execline handler scripts for all  diverted signals, compatible with,  and. They can currently work without modifications for BusyBox, and , by swapping the   and   handlers. The s6-linux-init package also contains an example execline shutdown script, it is the file in the package's  subdirectory.

This means that is not directly compatible with sysvinit's, , , , and  commands. However, many programs (e.g. those from desktop environments) expect to be able to call programs with those names during operation, so if such thing is needed, it is possible to use compatibility execline scripts:

The stage3 init
When an diverted signal handler invokes the  program,  performs its finish procedure, executing the  file in the  control subdirectory of its scan directory, using the POSIX   call, and passing a halt, poweroff or reboot argument to it. Therefore, it replaces as process 1 and becomes the stage3 init.

The stage3 init redirects its standard output and error to, uses the s6-svc -X command to make the catch-all logger exit cleanly, and performs all remaining tasks needed to shut the machine down. It must also kill all other processes that are still running at that point, after a grace period to allow them to exit on their own, so that filesystems can be synced and unmounted, or remounted read-only. This can be done with a POSIX  call specifying -1 as the process ID argument, usually to send a   signal followed by a   signal first, waiting for a short period of time, and then sending a   signal. Because the stage3 init runs as process 1, and process 1 does not get killed by a  call, it continues executing after that. Sending a  signal to all processes from a non-PID 1 process that is expected to continue running is much harder. The stage3 init can be, and normally is, an execline or shell script. The program provided by either the GNU Core Utilities package, the util-linux package  or the procps package , can be used in such a script as kill -TERM -1 , kill -CONT -1 and kill -KILL -1 (the last form will also kill itself, but not the stage3 init). The program from the s6-portable-utils package can also be used in such a script, as s6-nuke -t (  +  ) and s6-nuke -k. And a shell stage3 script that invokes a shell with a builtin utility works too. In that case, process 1 will be a shell process that sends the signals itself. The command can be used in an execline stage3 script to reap all resulting zombie processes.

When the stage3 init finishes its work, it performs the halt, poweroff or reboot operation with a Linux  call. If it is a script, it can use the BusyBox, and  applets, or the ,  and  programs from s6-linux-init, passing them an   (force) option and the argument supplied by.

The general structure of an execline stage3 script is as follows, or a variation thereof:

Where:


 * ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally ).
 * ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g., making the scan directory's absolute pathname ).
 * ${logger_servicedir} is a placeholder for the name of the catch-all logger's service directory (e.g., making the service directory's absolute pathname ).

Gentoo's official repository does not supply any package with a stage3 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The program from s6-linux-init can create an execline stage3 script with the aforementioned structure, that uses programs from packages s6-portable-utils  and s6-linux-utils, and can be used as a basis for writing a custom or more elaborate one, if so desired. The stage3 script flushes all the dirty system buffers, and blocks until they're clean, with the  program, kills all processes using s6-nuke -th and s6-nuke -k commands, unmounts all partitions according to  using the  program, and remounts the rootfs read-only using the  program. is from s6-portable-utils, and and, from s6-linux-utils.

Service management
On an s6 and s6-rc-based init system, the s6-rc package is used for service management. In particular, the administrator can replace the init system's compiled service database with a new one using the program, and can create a new boot-time service database, to be used next time the machine boots, with the  program and a set of service definitions in the program's supported source format. It is best to have the invocation in the stage2 init use a symbolic link as the compiled service database pathname, so that the boot-time database can be changed by modifying the symlink instead of the stage2 init code, e.g. by having an  directory for storing one or more compiled databases, making  a symbolic link to one of those databases, and using the symlink in the  invocation.

It is possible to have long-lived processes not managed by s6-rc but supervised by process 1, by directly managing s6 service directories, placing them (or symbolic links to them) in process 1's scan directory, and using s6-svscanctl -a, s6-svscanctl -n or s6-svscanctl -N commands as needed. It is also possible to use as process 1 and just s6 tools, without s6-rc, but then the init system becomes more like runit. In that case, executing with signal diversion turned on is not necessary.

s6 service directories and s6-rc service definitions for anything not supplied in packages from Gentoo's official repository must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

External resources

 * lh-bootstrap, a set of scripts that build a disk image for a virtual machine such as QEMU. The image contains a Linux kernel and a collection of small user-space tools such as BusyBox and dropbear, all statically linked to musl , and an s6 and s6-rc-based init system.
 * Obarun, an Arch derivative with an s6 and s6-rc-based init system.
 * Slew, a project that provides stage1, stage2, stage3 inits and s6-svscan diverted signal handlers, as well as s6-rc service definition directories in 's source format for several services and other supporting scripts, to make an s6 and s6-rc-based init system. Most scripts require Byron Rakitzis's implementation of the Plan 9 shell,, for Unix.