User:flexibeast/drafts/S6

From Gentoo Wiki
Jump to:navigation Jump to:search
Warning, this page is a work in progress by Flexibeast (talk | contribs). Treat its contents with caution.
Note
This page is an attempt to consolidate and streamline the S6 and S6 and s6-rc-based init system pages. The overall approach: break up the text into smaller chunks, using headings and paragraphs; minimise comparison to daemontools, try to keep article more self-contained; remove detailed descriptions of options and behaviour, instead encouraging reading of the s6 documentation; move examples not demonstrating a particular setup task to separate 'examples' page.

s6 is a package that provides a daemontools-inspired process supervision suite, a notification framework, a UNIX domain super-server, and tools for file descriptor holding and suidless privilege gain. It can be used as an init system component, and also as a helper for supervising OpenRC services. A high level overview of s6 is available here. The package's documentation is primarily provided in HTML format, and can be read on a text user interface using for example www-client/links. However, a man page port of the s6 documentation, app-misc/s6-man, is available in the GURU repository.

Installation

USE flags

USE flags for sys-apps/s6 skarnet.org's small and secure supervision software suite

execline enable support for dev-lang/execline

Emerge

root #emerge --ask sys-apps/s6

Configuration

Environment variables

  • UID - The process' user ID set by s6-applyuidgid when invoked with the -U option.
  • GID - The process' group ID set by s6-applyuidgid when invoked with the -U option.
  • GIDLIST - The process' supplementary group list set by s6-applyuidgid when invoked with the -U option. Must be a comma-separated list of numeric group IDs, without spaces.

Files

  • /run/openrc/s6-scan - s6-svscan's scan directory when using OpenRC's s6 integration feature.
  • /var/svc.d - Service directory repository searched by OpenRC when using the s6 integration feature.

Service

OpenRC

Refer to this section.

Usage

A note on execline

The author of s6 is also the author of dev-lang/execline, a scripting language built around chain loading[1]. Execline aims to support the creation of lightweight and efficient scripts, by:

  • avoiding spawning and initializing a big command interpreter such as sh; and
  • simplifying parsing and doing it only once, when the script is read by the interpreter[2].

sys-apps/s6 depends on dev-lang/execline because some of its programs call execline programs or use the execline library, libexecline. However, s6 scripts don't need to use execline; they can use other command interpreters, such as sh.

Process supervision

For more in-depth information about the process supervision aspects of s6, see daemontools-encore. s6 programs with functionality similar to a daemontools program have the daemontools name prefixed with s6-; the exceptions are s6-log (daemontools' multilog) and s6-setsid (daemontools' pgrphack).

Complete information about the following programs, directory structures and files can be found in sys-apps/s6's /usr/share/doc/ subdirectory.

s6-supervise
s6-supervise takes the (absolute or relative to the working directory) pathname of a service directory (servicedir) as an argument. This directory must contain an executable file named run, but it can also contain:
  • a regular file named down;
  • a subdirectory (or symbolic link to a directory) named log;
  • an executable file named finish;
  • a regular file named timeout-finish.
finish can be used to perform cleanup actions each time the supervised process terminates, possibly depending on its exit status information. s6-supervise calls finish with two arguments: firstly, the supervised process' exit code, or 256 if it was killed by a signal; and secondly, the signal number if the supervised process was killed by a signal, or an undefined number otherwise.
s6-supervise sends the finish process a SIGKILL signal if it runs for too long, by default 5 seconds. An alternative value can be specified in the timeout-finish file" an unsigned integer that specifies how many milliseconds the finish process is allowed to run before being killed.
s6-supervise makes its child process the leader of a new session using the POSIX setsid() call, unless the servicedir contains a regular file named nosetsid. In that case, the child process will run in s6-supervise's session instead. s6-supervise waits for a minimum of 1 second between two spawns of run, so that it does not loop too quickly if the supervised process exits immediately. If s6-supervise receives a SIGTERM, it behaves as if an s6-svc -dx command naming the corresponding service directory had been used (see later); if it receives a SIGHUP signal, it behaves as if an s6-svc -x command naming the corresponding service directory had been used.
s6-supervise keeps control files in a subdirectory of the servicedir, supervise. If it finds a symbolic link to a directory with that name, s6-supervise will follow it and use that directory for its control files.
s6-supervise also uses a servicedir subdirectory named event, for notifications about the supervised process' state changes (see the notification framework). If this directory doesn't exist, s6-supervise will create it as a FIFO directory restricted to members of its effective group; if it does exist, including as a symlink to another directory, s6-supervise will use it as-is.

s6-svscan
s6-svscan supervises a collection of processes running in parallel, via a scan directory (scandir), which represents the supervision tree's root.
s6-svscan keeps control files in a subdirectory of the scandir, named .s6-svscan. If this subdirectory or any of its files doesn't exist when s6-svscan is invoked, they will be created. s6-svscan can be controlled by sending it signals (such as a SIGALRM to force a scan), or by using the s6-svscanctl program; s6-svscanctl communicates with s6-svscan using a FIFO in the .s6-svscan subdirectory, and accepts a scan directory pathname, together with options that specify what to do. Refer to the s6-svscanctl page in the documentation for details.
When s6-svscan performs a scan, it checks the scandir and launches an s6-supervise child process for each new servicedir it finds, or old servicedir for which it finds its s6-supervise process has exited. All services with a corresponding servicedir are considered active. If s6-svscan finds that there s6-supervise children without a corresponding servicedir, the service is not stopped, but considered inactive.
To support its role as process 1, s6-svscan performs a reaper routine each time it receives a SIGCHLD signal, i.e. it uses a POSIX waitpid() call for each child process that becomes a zombie, both the ones it has spawned itself, and the ones that were reparented to process 1 by the kernel because its parent process died. An s6-svscanctl -z command naming its scan directory can be used to force s6-svscan to perform its reaper routine.

s6-svscan program was written to be robust, and go out of its way to stay alive, even in dire situations, making it suitable for running as process 1 during most of a machine's uptime. However, the duties of process 1 vary widely during:
  • the machine's boot sequence;
  • its normal, stable 'up and running' state;; and
  • its shutdown sequence.
The first and third cases are heavily system-dependent, so it is not possible to use a program designed to be as portable as possible[3]. Because of that, auxiliary and system-dependent programs, named stage1 init and stage3 init are used during the boot sequence and the shutdown sequence, respectively, to run as process 1, and s6-svscan is used the rest of the time. For details, see s6 init with s6-rc.

s6-svc
s6-svc controls supervised processes, and s6-svstat queries them for status information.
s6-svc takes a servicedir as an argument, together with options that various actions. Any additional servicedir after the first one will be ignored.

s6-svstat
s6-svstat takes a servicedir as an argument; any pathname after the first one will be ignored. Without options, or with only the -n option, s6-svstat prints a human-readable summary of all the available information on the service: for example, whether the supervised process is running ("run") or not ("down"), whether it is transitioning to the desired state or already there ("want up" or "want down"), how long it has been in the current state, and whether its current up or down status matches the presence or absence of a down file in the servicedir ("normally up" or "normally down"). It also shows if the supervised process is paused (because of a SIGSTOP signal).

s6-log
s6-log is the s6 logger program. It treats its argument as a logging script containing directives. For details, refer to the relevant page in sys-apps/s6's /usr/share/doc/ subdirectory.

Querying and modifying supervised state

s6svok
s6-svok checks whether an s6-supervise process is currently running on the servicedir specified as an argument. Its exit status is 0 if there is one, and 1 if there isn't.

s6-svdt
s6-svdt takes a servicedir as an argumnet, and prints the corresponding service's death tally. For each recorded process termination, one line is printed.

s6-permafailon
s6-permafailon is a chain loading program that assumes that its working directory is a servicedir, and checks termination events in the corresponding death tally. If they match specified conditions, the program exits with code 125,;otherwise, it executes the next program in the chain. This makes it suitable for use in finish files, to signal permanent failure to s6-supervise when certain conditions are met.

s6 also provides chain loading programs that can be used to modify a supervised process' execution state: s6-envdir, s6-envuidgid, s6-setlock, s6-setuidgid, s6-softlimit and s6-setsid. There is also a generalized version of s6-setuidgid, s6-applyuidgid.

Examples

Refer to this page for a number of examples of s6 supervision tool usage.

The logging chain

A supervision tree where all leaf processes have a logger can be arranged into a logging chain[4], rather than the traditional syslog-based centralized approach[5].

Since processes in a supervision tree are created using the POSIX fork() call, each of them will inherit s6-svscan's standard input, output and error. A logging chain arrangement is as follows:

  • Leaf processes should normally have a logger, so their standard output and error connect to their logger's standard input. Therefore, all their messages are collected and stored in dedicated, per-service logs by their logger. Some programs might need to be invoked with special options to make them send messages to their standard error, and redirection of standard error to standard output (i.e. 2>&1 in a shell script or fdmove -c 2 1 in an execline script) must be performed in the servicedir's run file.
  • Leaf processes with a controlling terminal are an exception: their standard input, output and error connect to the terminal.
  • s6-supervise, the loggers, and leaf processes that don't have a logger for some reason, inherit their standard input, output and error from s6-svscan, so their messages are sent wherever the ones from s6-svscan are.
  • Leaf processes that still unavoidably report their messages using syslog() have them collected and logged by a (possibly supervised) syslog server.

s6 and s6-rc-based init is arranged in such a way that s6-svscan's messages are collected by a catch-all logger, and that logger's standard error is redirected to /dev/console.

The notification framework

Notification is a mechanism by which a process can become instantly aware that a certain event has happened, as opposed to the process actively and periodically checking whether it happened (which is called polling)[6]. The s6 package provides a general notification framework that doesn't rely on a long-lived process (e.g. a bus daemon), so that it can be integrated with its supervision suite. The notification framework is based instead on FIFO directories.

FIFO directories and related tools

A FIFO directory (or fifodir) is a directory in the filesystem asociated with a notifier, a process in charge of notifying other processes about some set of events.

A FIFO is a named pipe (cf. fifo(7)). The fifodir contains FIFOs, each of them associated with a listener: a process that wants to be notified about one or more events. A listener creates a FIFO in the fifodir and opens it for reading; this is called subscribing to the fifodir. When a certain event happens, the notifier writes to each FIFO in the fifodir. Written data is conventionally a single character encoding the identity of the event. Listeners wait for notifications using some blocking I/O call on the FIFO; unblocking and successfully reading data from it is their notification. A listener that no longer wants to receive notifications removes its FIFO from the fifodir; this is called unsubscribing.

FIFOs and FIFO directories need a special ownership and permission setup to work:

  • The owner of a fifodir must be the notifier's effective user.
  • A publicly accessible fifodir can be subscribed to by any user; its permissions must be 1733 (i.e. drwx-wx-wt).
  • A restricted fifodir can be subscribed to only by members of the fifodir's group; its permissions must be 3730 (i.e. drwx-ws--T).
  • The owner of a FIFO in the fifodir must be the corresponding listener's effective user; its permissions must be 0622 (prw--w--w-). For complete information about FIFO directory internals, refer to available the documentation.

s6-mkfifodir
s6-mkfifodir creates a FIFO directory with the correct ownership and permissions. There is also a s6-cleanfifodir program, which removes all FIFOs in a fifodir that don't have an active listener. Normally, FIFOs are removed when the corresponding listener unsubscribes, so s6-cleanfifodir is a cleanup tool for cases when this fails (e.g. the listener was killed by a signal).

s6-ftrig-notify
s6-ftrig-notify notifies all subscribers of a fifodir, and can be used to create a notifier program. It accepts the pathname of a fifodir and a message that is written as-is to all FIFOs in the fifodir. Each character in the message is assumed to encode an event, and the character sequence should reflect the event sequence.

s6-ftrig-wait
s6-ftrig-wait subscribes to a fifodir and waits for a notification, and can be used to create a listener program. It accepts the pathname of a fifodir and a POSIX Extended Regular Expression (as described in regex(7)), creates a FIFO in the fifodir with correct ownership and permissions, and waits until it reads a sequence of characters that match the regular expression. Then it unsubscribes from the fifodir by removing the FIFO, prints the last character read from it to its standard output, and exits.

Because performing an action that might trigger an event recognized by a notifier, and subscribing to its fifodir to be notified of the event is susceptible to races that might lead to missing the notification, s6 provides two additional programs: s6-ftrig-listen and s6-ftrig-listen1.

s6-ftrig-listen
s6-ftrig-listen subscribes to each specified fifodir, runs a program as a child process with the supplied arguments, and waits for notifications. It makes sure that the program is executed after there are listeners begin reading from their FIFOs.

s6-ftrig-listen1
s6-ftrig-listen1 is a single fifodir and regular expression version of s6-ftrig-listen that doesn't need execlineb-encoded arguments. It prints the last character read from the created FIFO to its standard output.

A timeout can be set for s6-ftrig-wait, s6-ftrig-listen and s6-ftrig-listen1 by specifying a -t option followed by a time value in milliseconds. The programs exit with an error status it they haven't been notified about the desired events after the specified time.

s6-supervise's use of notification

The event subdirectory of an s6 servicedir is a fifodir used by s6-supervise to notify interested listeners about its supervised process' state changes. That is, s6-supervise acts as the notifier associated with the event fifodir, and writes a single character to each FIFO in it when there is a state change:

  • At program startup, after creating event if it doesn't exist, s6-supervise writes an s character (start event).
  • Each time s6-supervise spawns a child process executing the run file, it writes a u character (up event).
  • If the supervised process supports readiness notification, s6-supervise writes a U character (up and ready event) when the child process notifies its readiness.
  • If the service directory contains a finish file, and, when executed, exits with exit code 125 (permanent failure), s6-supervise writes an O character (once event, the character is a capital 'o').
  • Each time the supervised process stops running, s6-supervise writes a d character (down event).
  • If the service directory contains a finish file, s6-supervise writes a D character (really down) each time finish exits or is killed. Otherwise, s6-supervise writes the character right after the down event notification.
  • When s6-supervise is about to exit normally, it writes an x character (exit event) after the supervised process stops and it has notified listeners about the really down event.

s6-svwait
s6-svwait is a process supervision-specific notification tool. It accepts service directory pathnames and options that specify an event to wait for. At program startup, for each specified servicedir it checks the status file in its supervise control subdirectory to see if the corresponding supervised process is already in the state implied by the specified event, and if not, it subscribes to the event fifodir and waits for notifications from the corresponding s6-supervise process.

s6-svlisten
s6-svlisten is supervision-specific version of s6-ftrig-listen. It accepts servicedir pathnames in the format generated by execlineb when parsing block syntax, together with a program name and its arguments, and options that specify an event to wait for. s6-svlisten1 is a single servicedir version of s6-svlisten that doesn't need execlineb-encoded arguments.

s6-svwait, s6-svlisten and s6-svlisten1 accept a -t option to specify a timeout in the same way as s6-ftrig-wait.

Finally, the s6-svc program accepts a -w option that makes it wait for notifications from the s6-supervise process corresponding to the service directory specified as argument, after asking it to perform an action on its child process.

Refer to this page for examples of s6 notification mechanisms.

Service readiness notification

When a process is supervised, it transitions to the 'up' state when its supervisor has successfully spawned a child process executing the run file. s6-supervise considers this an up event, and notifies all listeners subscribed to the corresponding event fifodir about it. But when the supervised process is executing a server program for example, it might not be ready to provide its service immediately after startup. Programs might do initialization work that could take some noticeable time before they are actually ready to serve, but it is impossible for the supervisor to know exactly how much. Because of this, and because the kind of initialization to do is program-specific, some sort of collaboration from the supervised process is needed to help the supervisor know when it is ready[7]. This is called readiness notification.

To support readiness notification under s6, a program implements the s6 readiness notification protocol, which works like this:

  1. At program startup, the program expects to have a file descriptor open for writing, associated with a notification channel. The program chooses the file descriptor. For example, it can be specified as a program argument, or be a fixed, program-specific well-know number specified in the program's documentation.
  2. When all initialization work necessary to reach the program's definition of 'service ready state' has been completed, it writes a newline character to the notification channel.
  3. The program closes the notification channel after writing to it.

s6 uses readiness notification when a regular file named notification-fd is present in a service directory, containing an integer that specifies the program's chosen notification channel file descriptor. s6-supervise implements the notification channel as a pipe between the supervised process and itself; when it receives a newline character signalling the service's readiness, it considers that an up and ready event and notifies all listeners subscribed to the event fifodir about it. After that, s6-supervise no longer reads from the notification pipe, so it can be safely closed by the child process.

Note
Using s6-svscan's -d option signals shallow readiness: s6-svscan's readiness does not mean that all the supervision tree processes launched by it are themselves ready, or even started, or even that their corresponding s6-supervise parent processes have been started. Therefore, this option cannot be relied on if a test for deep readiness, meaning that all supervision tree processes have been started and are ready, is needed.

s6-notifyoncheck
The s6-notifyoncheck program can be used in combination with programs that don't support readiness notification, but which nevertheless be polled for readiness. In that case, s6-notifyoncheck can be invoked from a run file, use the available polling mechanisms, and signal readiness itself to s6-supervise using the s6 readiness notification protocol.
s6-notifyoncheck is a chain loading program that assumes its working directory is a servicedir. It spawns a child process that polls for readiness, then executes the next program in the chain. By default, the child process will try to execute a file named check in a subdirectory named data as a child process (i.e. the pathname of the file, relative to s6-notifyoncheck's working directory, is data/check). Just like run or finish, check can have any file format that the kernel knows how to execute, but is usually an execline or shell script. When executed, check is expected to poll the supervised process for readiness, and then exit with code 0 if the process was verified to be ready, or exit with a nonzero code otherwise.
By default, s6-notifyoncheck expects to be able to read the file descriptor it should use for the notification channel from a notification-fd file in its working directory (i.e. the file used by s6-supervise), and uses a single POSIX fork() call to create the polling child process, so the next program in the chain must be able to reap it when it terminates (e.g. with a POSIX wait() call). If a -3 option followed by an unsigned integer value is passed to s6-notifyoncheck, it will use the specified value as the notification channel's file descriptor, ignoring notification-fd. And if a -d option is passed to s6-notifyoncheck, it will use two fork() calls instead of one (i.e. it will double fork), so the poller process will be reparented to process 1 (or to a local reaper), and the next program in the chain won't have to reap it. This is useful to avoid having a lingering zombie process if the next program in the chain does not reap child processes it doesn't know about.
If the poll for readiness is successful, s6-notifyoncheck's polling process signals readiness using the notification channel. It periodically retries execution of check or invocation of execlineb until a poll is successful, a timeout period expires, a certain number of unsuccessful polls has been reached, depending on the options supplied to s6-notifyoncheck. It then exits. A retry is performed once every second by default, i.e. the default polling period is 1 second, but a different one can be specified by passing a -w option to s6-notifyoncheck, followed by a time value in milliseconds.

Refer to this page for examples of s6 readiness notification mechanisms.

The UNIX domain super-server and related tools

Refer to this page.

Suidless privilege gain tools

s6 provides two programs, s6-sudoc and s6-sudod, that can be used to implement controlled privilege gains without setuid programs. This is achieved by:

  • s6-sudod running as a long-lived process with an effective user that has the required privileges, and bound to a stream mode UNIX domain socket; and
  • s6-sudod, which can run with an unprivileged effective user, ask the s6-sudod process to perform an action on its behalf.

s6-sudod
s6-sudod must be spawned by a UCSPI server (like s6-ipcserverd) and accepts options and an argument sequence s1, s2, ... that can be empty. s6-sudod concatenates its argument sequence with the one received from the client, and passes it to a POSIX execve() call, which results in a program invocation.

s6-sudoc
s6-sudoc must be spawned by a UCSPI client and accepts options and an argument sequence c1, c2, ... that can also be empty. s6-sudoc transmits the argument sequence over the connection to the server, that must be an s6-sudod process, and its environment variables, unless it is invoked with an -e option. s6-sudoc also transmits its standard input, output and error file descriptors to s6-sudod using SCM_RIGHTS control messages (i.e. fd-passing), so that the invoked program will run as a child process of s6-sudod, with s6-sudod's effective user, but its standard input, output and error descriptors will be a copy of s6-sudoc's. The program's environment will be s6-sudod's environment, except that every variable that is defined but has an empty value will set to the value it has in s6-sudoc 's enviroment, if it is also set. s6-sudoc waits until s6-sudod's child process exits. If it is invoked with a -T option followed by a time value in milliseconds, it will close the conection and exit after the specified time has passed if s6-sudod's child is still running.

s6-sudo
s6-sudo takes a UNIX domain socket pathname and an s6-sudoc argument sequence, and invokes s6-ipcclient chained to s6-sudoc. The socket pathname is passed to s6-ipcclient, and the argument sequence, to s6-sudoc. s6-sudo options specify corresponding s6-ipcserver-socketbinder and s6-sudoc options.
Standard permissions settings on s6-sudo's listening socket can be used to implement some access control, and credentials passing over a UNIX domain socket also allows finer-grained control. The s6-ipcserver-access program can be used to take advantage of credentials passing.
Important
If s6-sudoc is killed, or exits while s6-sudod's child process is still running, s6-sudod will send a SIGTERM followed by a SIGCONT signal to its child, and then exit 1. However, sending a SIGTERM to the child does not guarantee that it will die, and if it keeps running, it might still read from the file descriptor that was s6-sudoc's standard input, or write to the file descriptors that were s6-sudoc's standard output or error. This is a potential security risk. Administrators should audit their server programs to make sure this does not happen. More generally, anything using signals or terminals will not be handled transparently by the s6-sudoc+s6-sudod mechanism. The mechanism was designed to allow programs to gain privileges in specific situations: short-lived, simple, noninteractive processes. It was not designed to emulate full setuid functionality and will not go out of its way to do so. Also, s6-sudoc's argument sequence may be empty. In that case, the client is in complete control of the program executed as s6-sudod's child. This setup is permitted but very dangerous, and extreme attention should be paid to access control.

Refer to this page for a number of examples of usage of s6 privilege gain tool usage.

The file descriptor holder and related tools

Refer to this page. A combination of these and other s6 tools allow the implementation of the mechanism that systemd calls socket activation, for services that want that.

OpenRC's s6 integration feature

OpenRC can launch supervised long-lived processes using the s6 package as a helper [8]. This is an alternative to 'classic' unsupervised long-lived processes launched using the start-stop-daemon program. It should be noted that service scripts that don't contain start() and stop() functions implicitly use start-stop-daemon.

OpenRC services that want to use s6 supervision need both a service script in /etc/init.d/ and an s6 service directory. The service script must contain a supervisor=s6 variable assignment to turn the feature on, and must have a 'need' dependency on the s6-svscan service in its depend() function, to make sure the s6-svscan program is launched. It must not contain:

  • a start() function;
  • a stop() function (but their _pre() and _post() variants are OK); or
  • a status() function.

When the service script is called with a 'start' argument, OpenRC internally invokes s6-svc with a -u option, and can also call s6-svwait after s6-svc to wait for an event, by assigning s6-svwait options to the s6_svwait_options_start variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d/). For example, if the service supports readiness notification, s6_svwait_options_start="-U -t 5000" could be used to make OpenRC wait for the up and ready event with a 5 seconds timeout.

When the service script is called with a 'stop' argument, OpenRC internally invokes s6-svc with the -d, -wD and -T options, so it will wait for a 'really down' event with a default timeout of 10 seconds. The timeout can be changed by assigning a time value in milliseconds to s6_service_timeout_stop variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d/).

when the service script is called with a 'status' argument, OpenRC internally invokes s6-svstat.

The s6 service directory can be placed anywhere in the filesystem, and have any name, as long as the service script (or the service-specific configuration file in /etc/conf.d/) assigns the servicedir's absolute path to the s6_service_path variable. If s6_service_path is not assigned to, the s6 servicedir must have the same name as the OpenRC service script, and will be searched in /var/svc.d/. When using this feature, the scan directory is /run/openrc/s6-scan/, and OpenRC will create a symlink to the service directory when the service is started.

Warning
OpenRC does not integrate as expected when s6-svscan is running as process 1, since OpenRC will launch another s6-svscan process with /run/openrc/s6-scan/ as its scan directory. So the result will be two independent supervision trees.

Refer to this page for a number of examples of OpenRC s6 integration.

Starting a supervision tree

From OpenRC

OpenRC provides a service script that can launch s6-svscan, also named s6-svscan. On Gentoo, the scan directory will be /run/openrc/s6-scan. This script exists to support the OpenRC-s6 integration feature, but can be used to just launch an s6 supervision tree, either when the machine boots via an OpenRC runlevel:

root #rc-update add s6-svscan default

Or started manually at any time:

root #rc-service s6-svscan start
Note
The service script launches s6-svscan using OpenRC's start-stop-daemon program, so it will run unsupervised, and have its standard input, output and error redirected to /dev/null.

Because /run is a tmpfs, and therefore volatile, servicedir symlinks must be created in the scan directory each time the machine boots, before s6-svscan starts. The tmpfiles.d interface, which is supported by OpenRC using package opentmpfiles (sys-apps/opentmpfiles), can be used for this:

FILE /etc/tmpfiles.d/s6-svscan.conf
#Type Path Mode UID GID Age Argument
d /run/openrc/s6-scan
L /run/openrc/s6-scan/service1 - - - - /path/to/servicedir1
L /run/openrc/s6-scan/service2 - - - - /path/to/servicedir2
L /run/openrc/s6-scan/service3 - - - - /path/to/servicedir3

As an alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's default runlevel, by placing '.start' and '.stop' files in /etc/local.d (cf. /etc/local.d/README), which perform actions similar to those of the s6-svscan service script:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscan -- -S /path/to/scandir
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

The -S option will explicitly disable signal diversion so that the SIGTERM signal that start-stop-daemon sends to s6-svscan will make it act as if an s6-svscanctl -rt command had been used.

And as another alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's default runlevel, with /service as the scan directory, using a '.start' file that calls the s6-svscanboot script provided as an example (see starting the supervision tree from sysvinit), instead of calling s6-svscan directly. This allows setting up a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, provided a service directory for the logger exists in /service:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
# Remember to symlink /command to /bin
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscanboot
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

From sysvinit

sys-apps/s6 provides a s6-svscanboot execline script, which can be launched and supervised by sysvinit by adding a respawn line for it in /etc/inittab[9]. It launches an s6-svscan process, with its standard output and error redirected to /service/s6-svscan-log/fifo. This allows setting up a FIFO and a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, with the the same technique used by s6 and s6-rc-based init. s6-svscan's standard input will be redirected to /dev/null. The enviroment will be emptied and then set according to the contents of environment directory /service/.s6-svscan/env, if it exists, with an s6-envdir invocation. The scan directory will be /service.

To use s6-svscanboot, copy it from {{Package|sys-apps/s6}'s /usr/share/doc/ subdirectory to /bin, uncompressing it if necessary. Then, manually edit /etc/inittab:

FILE /etc/inittab
SV:12345:respawn:/bin/s6-svscanboot

and ask telinit to reload its configuration:

root #telinit q

This will make sysvinit launch and supervise s6-svscan when entering runlevels 1 to 5. Because s6 and execline programs used in the script and invoked using absolute pathnames are assumed to be in the directory /command/, a symlink to the correct path for Gentoo must be created:

root #ln -s bin /command

An s6 service directory for the s6-svscan logger can be created with the s6-linux-init-maker program from sys-apps/s6-linux-init:

root #s6-envuidgid <user> s6-linux-init-maker -l /service -U <tmpdir>
root #cp -a temp/run-image/{service/s6-svscan-log,uncaught-logs} /service

This will create an s6-log process logging to /service/uncaught-logs/, prepending messages with a timestamp in external TAI64N format. The placeholder "<user>" should be replaced by a valid account's username, to allow s6-log to run as an unprivileged process. The <tmpdir> directory will be a temporary directory created by s6-linux-init-maker on the working directory; it can be removed once the necessary subdirectories are copied to /service/.

s6 init with s6-rc

Warning
While Gentoo does offer sys-apps/s6, sys-apps/s6-rc and {sys-apps/s6-linux-init packages in its official repository, it does not completely support using them to make an init system. Users who want to do that might need to use alternative ebuild repositories and/or do some local tweaking.

The general setup of an init system based on s6 and s6-rc is as follows:

  1. When the machine boots, all initialization tasks needed to bring it to its stable, normal 'up and running' state, are split into a stage1 init and a stage2 init. The stage1 init is the s6-linux-init program from the package of the same name, which is invoked by the kernel, runs as process 1, and replaces itself with the s6-svscan program when its work is done. The stage2 init is invoked by the stage1 init, runs as a child of process 1, blocks until s6-svscan starts to execute, and exits when its work is done.
  1. During most of the machine's uptime, s6-svscan runs as process 1 with signal diversion turned on, and there is an s6 supervision tree rooted in process 1, launched as soon as s6-svscan starts to execute.
  1. A supervised catch-all logger is started as part of the supervision tree. The catch-all logger logs messages sent by supervision tree processes to s6-svscan's standard output and error, supporting a logging chain arrangement. The catch-all logger is optional; if one is not set up, messages that would be logged by it are printed to the machine's console instead.
  1. The stage2 init initializes the s6-rc service manager and starts a subset of the services defined in the boot-time compiled service database. Some of these services might carry out part of the machine's initialization tasks.
  1. While s6-svscan is running as process 1, services are normally managed using s6-rc tools. The s6-linux-init-telinit program, in combination with the runlevel changer service created by the s6-linux-init-maker program (both from sys-apps/s6-linux-init) allows the implementation of sysvinit-like runlevels.
  1. The administrator initiates the machine's shutdown sequence using the s6-linux-init-shutdown program or the s6-linux-init-hpr program, both from sys-apps/s6-linux-init. These programs communicate with a special supervision tree process, the shutdown daemon, which then takes care of the shutdown sequence, including the stopping of all services managed by s6-rc, before finally halting, powering off or rebooting the machine.

The boot sequence

stage1 init
When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), a stage1 init executes as process 1. This is usually a simple execline script wrapper (e.g. as created by s6-linux-init-maker) around the s6-linux-init program from the package of the same name. Using a script allows passing options to s6-linux-init which would otherwise have to be present in the kernel command line.
Therefore, if the wrapper script is named e.g. s6-gentoo-init, and placed in {{Path|/sbin}/}, to use an init system based s6 and s6-rc, an init=/sbin/s6-gentoo-init argument can be added to the kernel's command line using the bootloader's available mechanisms (e.g. a linux command in some 'Gentoo with s6 + s6-rc + s6-linux-init' menu entry for GRUB). It is possible to go back to sysvinit+OpenRC at any time, or to any other init system, by reverting the change. Alternatively, the wrapper script can be the /sbin/init file, in which case the init= parameter is not needed; but on Gentoo, this would conflict with sysvinit or systemd if it was installed with the sysv-utils USE flag.
s6-linux-init runs with its standard input, output and error initially redirected to the machine's console. It does all necessary setup for s6-svscan, including setting up its scan directory. Because at that point in the boot sequence the root filesystem might be the only mounted filesystem, and possibly read-only, s6-linux-init also mounts a tmpfs (at /run on Gentoo) as a read-write filesystem to hold control files that s6-svscan and s6-supervise need to write to. s6-linux-init uses a directory called the run image, that contains the initial scandir, and copies it to the read-write tmpfs as a directory named service. When s6-svscan starts running as process 1, it uses the directory in the tmpfs as its scandir (so its absolute pathname would be /run/service/ on Gentoo). The run image can be in a read-only filesystem, and must be subdirectory run-image of s6-linux-init's base directory (normally /etc/s6-linux-init/current/).
Because s6-linux-init runs as process 1, if it terminates in any way, there will be a kernel panic. Therefore, machine initialization is split between s6-linux-init, which does a minimal amount of work and then replaces itself with s6-svscan using a POSIX execve() call, and a stage2 init, which is spawned as a child process by s6-linux-init.

stage2 init
stage2 init is spawned by s6-linux-init as a child process, and is blocked from running until the latter replaces itself with s6-svscan. To achieve this, the child process of s6-linux-init opens the catch-all logger's FIFO for writing using the POSIX open() call. The call will block until some other process opens the FIFO for reading. The catch-all logger is a supervised process, so it starts executing when s6-svscan does, and opens the FIFO for reading, thereby unblocking the process, which then replaces itself with the stage2 init. If no catch-all logger is set up, the child process of s6-linux-init just {{Link|#s6readiness|waits until s6-svscan notifies its readine, using a pipe as the notification channel.
stage2 init executes with s6-svscan as process 1, and performs all remaining initialization tasks needed to bring the machine to its stable, normal 'up and running' state. It executes with a few vital supervised long-lived processes already running, started as part of process 1's supervision tree, including the catch-all logger, if one is used. When the stage2 init finishes its work, it just exits and gets reaped by s6-svscan.
stage2 init must be an executable file named rc.init,d located in the scripts/ subdirectory of s6-linux-init's base directory (normally /etc/s6-linux-init/current/). It is usually an execline or shell script.
Gentoo's official repository does not supply any package with a stage2 init for an init system base s6 and s6-rc. The s6-linux-init package installs an example rc.init shell script in /etc/s6-linux-init/skel/, containing comments illustrating how to set up the init system for a variety of rc subsystems.

s6-rc initialization
The s6-rc service manager Template:S6-rc when s6-svscan is already running. Therefore, initialization is performed by having stage2 init invoke s6-rc-init, which takes the pathname of a compiled service database as an argument (or defaults to /etc/s6-rc/compiled), together with the pathname of process 1's scandir (i.e. /run/service/ on Gentoo). So a suitable service database must exist and be available in a read-only filesystem: this is the boot-time service database. s6-rc's live state directory must be in a read-write filesystem. On Gentoo, by default, s6-rc-init will set the live state directory pathname to /run/s6-rc/ and place it in the read-write tmpfs mounted by s6-linux-init.
The initial state of all s6-rc services, as set by s6-rc-init, is down. So stage2 init must also start all atomic services (oneshots and longruns) that are needed to complete the machine's initialization, if any, as well as all longruns that are wanted up at the end of the boot sequence. This is performed by defining a service bundle in the boot-time service database that groups these atomic services, and having stage2 init start them with an s6-rc -u change command naming the bundle. This bundle would be the s6-rc counterpart to OpenRC's sysinit + boot + default runlevels, orsystemd's default.target unit.

The catch-all logger
In the context of an init system based on s6 and s6-rc, the catch-all logger is a supervised long-lived process that logs messages sent by supervision tree processes to s6-svscan's standard output and error, normally in an automatically rotated logging directory. In a logging chain arrangement, the leaf processes of a supervision tree normally have dedicated loggers that collect and store messages sent to the process' standard output and error in per-service logs. Messages from s6-svscan, s6-supervise processes, logger processes themselves, and leaf processes that don't have logger, are printed on process 1's standard output or error. At the beginning of the boot sequence, they are redirected by the kernel to the machine's console, and can be redirected later so that the messages are delivered to the catch-all logger, using a setup that involves a FIFO. Only the catch-all logger's standard error remains redirected to the machine's console, as a last resort.
The run image that is copied to the read-write tmpfs mounted by the stage1 init contains s6-svscan's initial scandir with a servicedir for the catch-all logger already present, so that it is started as soon as s6-svscan begins execution as process 1. This catch-all logger's servicedir must be named s6-svscan-log, since the s6-linux-init program passes s6-svscan an -X option (console holder) to redirect the catch-all logger's standard error.
The logging directory is owned by the catch-all logger's effective user after dropping privileges, and normally has permissions 2750 (i.e. drwx--s---). Because it must be set up by stage1 init before the init system's supervision tree is started, a subdirectory with the name, owner, group and permissions of the logging directory must exist in s6-linux-init's run image. This subdirectory will then be copied to the read-write tmpfs, the only read-write filesystem that can be guaranteed to exist when starting the supervision tree, setting this copy up as the catch-all logger's logging directory.
The s6-linux-init-maker program from s6-linux-init Template:S6-linux-init that uses the s6-log program. The logging directory of s6-linux-init-maker's logger is named uncaught-logs (so on Gentoo, its absolute pathname will be /run/uncaught-logs).

A FIFO is reserved some place in the filesystem for the catch-all logger. The FIFO is owned by root and has permissions 0600 (i.e. prw-------). The code of the catch-all logger's run file opens the FIFO for reading, redirects its standard input to it, optionally drops privileges (e.g. by invoking s6-setuidgid or s6-applyuidgid if it is a script) and calls the logger program.
stage1 init redirects its standard output and error to the catch-all logger's FIFO before replacing itself with s6-svscan, so s6-svscan and all supervision tree processes will have their standard output and error redirected this way as well, except the catch-all logger itself. Using a FIFO allows delaying the execution of {{Link}|#stage2|the stage2 init}} until s6-svscan is running as process 1.

Shutdown and reboot

The shutdown daemon
The init system's supervision tree includes a shutdown daemon which receives requests to initiate the shutdown sequence, either immediately or after a certain specified time elapses. The shutdown daemon is s6-linux-init-shutdownd from [https://packages.gentoo.org/packages/sys-apps/s6-linux-init sys-apps/s6-linux-init].
s6-linux-init-shutdownd executes a shutdown file, and waits for it to terminate. Generally speaking, the shutdown file undoes what stage2 init has done at boot time, and is normally an execline or shell script. Its code can use s6 tools and s6-rc services to do its work. In particular, if Template:S6-rc is in use, it can be used to stop all services managed by s6-rc (normally with an s6-rc -da change command).
Next, s6-linux-init-shutdownd stops all processes from the supervision tree except the catch-all logger (if one is used), kills all remaining processes, unmounts all mounted filesystem, and finally performs the halt, poweroff or reboot operation, as requested.
Optionally, after unmounting filesystems, s6-linux-init-shutdownd can also execute a final shutdown file, waiting for it to terminate before shutting down the machine. This file can be used to perform actions after all filesystems are unmounted: for example, to deactivate LVM logical volumes using a vgchange --activate command, or to wipe LUKS-encrypted volumes' keys from kernel memory and remove their existing mappings using a cryptsetup close command. While the final shutdown file is running the only filesystems still mounted are the rootfs (read-only), the tmpfs mounted at /run by s6-linux-init (read-write), and the devtmpfs, sysfs and proc filesystems.
The shutdown and final shutdown files must be executable files named rc.shutdown and rc.shutdown.final, respectively, and located in the scripts subdirectory of s6-linux-init-shutdownd's base directory (normally /etc/s6-linux-init/current/). They are usually execline or shell scripts. Gentoo's official repository does not supply any package with a shutdown file or final shutdown file for init systems based on s6 and s6-rc; users must create them from scratch or take them from somewhere else (e.g. alternative ebuild repositories). sys-apps/s6-linux-init installs example rc.shutdown and rc.shutdown.final shell scripts in /etc/s6-linux-init/skel/ containing comments that illustrate how to set up the init system for a variety of rc subsystems.

The s6-svscan diverted signal handlers
Since the program running as process 1 is s6-svscan with signal diversion turned on, use of diverted signal handlers defines what happens when it receives a signal. The s6-linux-init-maker program from sys-apps/s6-linux-init can create signal handler execline scripts for all s6-svscan-diverted signals, which either invoke the s6-linux-init-shutdown program from the same package to request that the machine be halted / powered off / rebooted, or do nothing. This allows shutting down the machine by sending signals to process 1, in addition to using the s6-linux-init-shutdown or s6-linux-init-hpr programs.

Compatibility scripts

s6-svscan is not directly compatible with sysvinit's telinit, halt, poweroff, reboot, and shutdown commands. However, the s6-linux-init-maker program from sys-apps/s6-linux-init can create execline compatibility scripts for these programs that invoke the s6-linux-init-telinit, s6-linux-init-shutdown and s6-linux-init-hpr programs from the same package, as appropriate.

Service management

On an init system based on s6 and s6-rc, the administrator can replace the init system's compiled service database with a new one using s6-rc-update, and create a new boot-time service database, to be used next time the machine boots, with s6-rc-compile and a set of service definitions in that program's source format. It's best to have the s6-rc-init invocation in stage2 init use a symbolic link as the compiled service database pathname, so that the boot-time database can be changed by modifying the symlink instead of the stage2 init code, e.g. by having an /etc/s6-rc/db/ directory for storing one or more compiled databases, making /etc/s6-rc/boot a symbolic link to one of those databases, and using the symlink in the s6-rc-init invocation.

It is possible to have long-lived processes not managed by s6-rc, but supervised by process 1, by managing s6 servicedirs directly, placing them (or symbolic links to them) in the init system's scandir, and using s6-svscanctl -a, s6-svscanctl -n or s6-svscanctl -N commands as needed. It's also possible to use s6-svscan as process 1 and just s6 tools, without s6-rc, but then the init system becomes more like runit.

s6 servicedirs and s6-rc service definitions for anything not supplied in packages from Gentoo's official repository must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

Runlevels

An s6-based init system can implement the equivalent of sysvinit-like runlevels. The s6-linux-init-maker program from sys-apps/s6-linux-init can create a runlevel changer service which performs a 'runlevel change' by invoking a runlevel changer file. The meaning of 'runlevel change' is defined by whatever this file does when executed.

If s6-linux-init-maker's runlevel changer service is used, the administrator requests a runlevel change using the s6-linux-init-telinit program from sys-apps/s6-linux-init. If s6-rc is in use, runlevels can be mapped to service bundles and the runlevel changer file can perform a runlevel change using an s6-rc change command with the -p (prune) option. The runlevel changer file must be an executable file, usually an execline or shell script, named runlevel and located in the scripts subdirectory of the runlevel changer service's base directory (normally /etc/s6-linux-init/current/). Gentoo's official repository doesn't supply any package with a runlevel changer file for s6-based init systems. The sys-apps/s6-linux-init package installs an example runlevel shell script in /etc/s6-linux-init/skel/, containing comments that illustrate how to implement runlevel-like functionality for a variety of rc subsystems.

See also

External resources

References

  1. The execline language design and grammar. Retrieved on July 8th, 2017.
  2. Laurent Bercot, Why not just use /bin/sh?. Retrieved on July 8th, 2017.
  3. How to run s6-svscan as process 1. Retrieved on August 20th, 2017.
  4. Laurent Bercot, The logging chain, Retrieved on May 1st, 2017.
  5. Laurent Bercot, On the syslog design, Retrieved on May 1st, 2017.
  6. Notification vs. polling. Retrieved on July 28th, 2017.
  7. Service startup notifications. Retrieved on July 29th, 2017.
  8. Using s6 with OpenRC. Retrieved on June 24th, 2017.
  9. How to run s6-svscan under another init process. Retrieved on July 16th, 2017.