s6

From Gentoo Wiki
Jump to:navigation Jump to:search

Note
A draft consolidation and restructure of this page and the S6_and_s6-rc-based_init_system page is available here; it is currently awaiting review by a Gentoo dev.

s6 is a package that provides a daemontools-inspired process supervision suite, a notification framework, a UNIX domain super-server, and tools for file descriptor holding and suidless privilege gain. It can be used as an init system component, and also as a helper for supervising OpenRC services. A high level overview of s6 is available here. The package's documentation is primarily provided in HTML format, and can be read on a text user interface using for example www-client/links. However, a man page port of the s6 documentation, app-misc/s6-man, is available in the GURU repository.

Installation

USE flags

USE flags for sys-apps/s6 skarnet.org's small and secure supervision software suite

execline enable support for dev-lang/execline

Emerge

root #emerge --ask sys-apps/s6

Configuration

Environment variables

  • UID - The process' user ID set by s6-applyuidgid when invoked with the -U option.
  • GID - The process' group ID set by s6-applyuidgid when invoked with the -U option.
  • GIDLIST - The process' supplementary group list set by s6-applyuidgid when invoked with the -U option. Must be a comma-separated list of numeric group IDs, without spaces.

Files

  • /run/openrc/s6-scan - s6-svscan's scan directory when using OpenRC's s6 integration feature.
  • /var/svc.d - Service directory repository searched by OpenRC when using the s6 integration feature.

Service

OpenRC

See here.

Usage

Process supervision

For more in-depth information about the process supervision aspects of s6, see daemontools-encore. A summary follows.

s6 program daemontools program with similar functionality
s6-log multilog
s6-setsid pgrphack


Other s6 programs that have a functionality similar to a daemontools program have the daemontools name prefixed with s6-.

The program that implements the supervisor features in s6 is s6-supervise, and just like daemontools' supervise, it takes the (absolute or relative to the working directory) pathname of a service directory (or servicedir) as an argument. An s6 service directory must contain at least an executable file named run, and can contain an optional, regular file named down, and an optional subdirectory or symbolic link to directory named log, all of which work like their daemontools counterparts. Like runit service directories, it can also contain an optional, executable file named finish, that can be used to perfom cleanup actions each time the supervised process terminates, possibly depending on its exit status information. s6-supervise calls finish with two arguments: the first one is the supervised process' exit code, or 256 if it was killed by a signal, and the second one is the signal number if the supervised process was killed by a signal, or an undefined number otherwise. Unlike runit's runsv, s6-supervise sends the finish process a SIGKILL signal if it runs for too long. If using s6 version 2.2.0.0 or later, there can be an optional, regular file in the service directory, named timeout-finish, and containing an unsigned integer value that specifies how much time (in milliseconds) the finish process is allowed to run until being killed. If that file is absent, a default value of 5 seconds is used, which is the fixed value used by earlier versions. Like daemontools-encore, s6-supervise makes its child process the leader of a new session using the POSIX setsid() call, unless the servicedir contains a regular file named nosetsid (daemontools-encore's counterpart file is named no-setsid, though). In that case, the child process will run in s6-supervise's session instead. s6-supervise waits for a minimum of 1 second between two run spawns, so that it does not loop too quickly if the supervised process exits immediately. If s6-supervise receives a SIGTERM signal, it behaves as if an s6-svc -dx command naming the corresponding service directory had been used (see later), and if it receives a SIGHUP signal, it behaves as if an s6-svc -x command naming the corresponding service directory had been used.

Just like daemontools' supervise, s6-supervise keeps control files in a subdirectory of the servicedir, named supervise, and if it finds a symbolic link to directory with that name, s6-supervise will follow it and use the linked-to directory for its control files. Unlike other process supervision suites, s6-supervise also uses a subdirectory in the servicedir, named event, for notifications about the supervised process' state changes (see the notification framework). If event doesn't exist, s6-supervise will create it as a FIFO directory restricted to members of its effective group. If event exists, s6-supervise will use it as-is, and if it is a symbolic link to directory, s6-supervise will follow it. Complete information about the service directory structure is available here, and for further information about s6-supervise, please consult the HTML documentation in the package's /usr/share/doc subdirectory.

The author of s6 is also the author of the execline package (dev-lang/execline), that implements the execline language, a scripting language built around chain loading[1]. Execline aims to help producing lightweight and efficient scripts, among other things, by reducing the time involved in spawning and initializing a big command interpreter (like e.g. the sh program for shell scripts), and by simplifying parsing and doing it only once, when the script is read by the interpreter[2]. The s6 package depends on execline because some of its programs call execline programs or use the execline library, libexecline. However, it is not required that run or finish files be execline scripts. Just like with daemontools, any file format that the kernel knows how to execute is acceptable, and, in particular, they can be shell scripts if so desired.

The s6-svscan program allows supervising a set of processes running in parallel using a scan directory (or scandir), just like daemontools' svscan, so it will be the supervision tree's root. s6-svscan from package version 2.3.0.0 or later does not perform periodic scans by default, like other process supervision suites do, unless it it passed a -t option with a scan period (as an unsigned integer value in milliseconds). Earlier versions had a default scan period of 5 seconds (equivalent to -t 5000) for compatibiliy with daemontools, that could be turned off by explicitly specifying a scan period of 0 (-t 0). s6-svscan can be forced to perform a scan by sending it a SIGALRM signal, or by using s6-svscanctl (see later). When s6-svscan performs a scan, it checks the scan directory and launches an s6-supervise child process for each new servicedir it finds, or old servicedir for which it finds its s6-supervise process has exited. All services with a corresponding servicedir are considered active. s6-supervise children for which s6-svscan finds that their corresponding servicedir is no longer present are not stopped, but the service is considered inactive.

s6-svscan keeps control files in a subdirectory of the scandir, named .s6-svscan. If this subdirectory or any of its files doesn't exist when s6-svscan is invoked, they will be created. s6-svscan can be controlled by sending it signals, or by using the s6-svscanctl program. s6-svscanctl communicates with s6-svscan using a FIFO in the .s6-svscan subdirectory, and accepts a scan directory pathname, and options that specify what to do. Some of s6-svscanctl's options are:

  • -a (alarm): make s6-svscan perform a scan. Equivalent to sending s6-svscan a SIGALRM signal.
  • -n (nuke): make s6-svscan stop s6-supervise child processes corresponding to inactive services, by sending each of them a SIGTERM signal, or a SIGHUP signal if they are running on the log subdirectory of a service directory. This way, if the log subdirectory's run file executes a logger, its corresponding s6-supervise process will wait for it to exit instead of killing it. Daemontools-style loggers like s6-log would normally exit on their own when the 'main' process terminates, because they would detect an end-of-file condition when trying to read from the pipe that connects the processes.
  • -N (really nuke): make s6-svscan stop s6-supervise child processes corresponding to inactive services, by sending each of them a SIGTERM signal, even if they are running on the log subdirectory of a service directory.
  • -t (terminate): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGTERM signal, or a SIGHUP signal if they are running on the log subdirectory of a service directory, and then make s6-svscan start its finish procedure. Therefore, s6-svscanctl -t provides a way to tear down an s6 supervision tree. Equivalent to sending s6-svscan a SIGTERM signal, unless signal diversion is turned on.
  • -q (quit): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGTERM signal, even if they are running on the log subdirectory of a service directory, and then make s6-svscan start its finish procedure. Equivalent to sending s6-svscan a SIGQUIT signal, unless signal diversion is turned on.
  • -h (hangup): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGHUP signal, and then make s6-svscan start its finish procedure. Equivalent to sending s6-svscan a SIGHUP signal, unless signal diversion is turned on.
  • -b (abort): make s6-svscan start its finish procedure without stopping its s6-supervise child processes. Equivalent to sending s6-svscan a SIGABRT signal.

Other s6-svscanctl options are used to control s6-svscan's finish procedure. For further information about s6-svscan, and the full description of s6-svscanctl's functionality, please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6-log is the logger program provided by the s6 package. Just like daemontools' multilog program, it treats its arguments as a logging script, composed by a sequence of directives that specify what to do. Directives that start with . or / (daemontools-style automatically rotated logging directories or logdirs) behave like their daemontools' multilog counterparts, and so do directives that start with s, n, !, t, + and -, except that patterns in + and - directives are full POSIX extended regular expressions (i.e. those accepted by the grep -E command), and the processor specified in an ! directive invokes execlineb, the script parser and launcher from the execline package, with -P and -c options, instead of sh. Therefore, the processor's arguments can use execline syntax and, for example, an s6-log '!processor-script arg1 arg2' ./dirname command makes s6-log launch a processor with the equivalent of an execlineb -Pc 'processor-script arg1 arg2' command during a rotation of logdir dirname in s6-log's working directory. If an executable file named processor-script is found via PATH search, this will invoke it with arguments arg1 arg2 and feed it the contents of dirname/current on its standard input. A T directive prepends each logged line with a timestamp in ISO 8601 format for combined date and time representing local time according to the system's timezone, with a space (not a 'T') between the date and the time and two spaces after the time. For s6-log, t (timestamp in external TAI64N format) and T directives can appear in any place of the logging script; directives appearing before them apply to lines without the prepended timestamp, and directives appearing after them apply to lines with the prepended timesptamp. s6-log can be forced to perform a rotation on a logdir by sending it a SIGALRM signal. For the full description of s6-log's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6 also provides chain loading programs that can be used to modify a supervised process' execution state. s6-envdir, s6-envuidgid, s6-setlock, s6-setuidgid, s6-softlimit and s6-setsid are similar to daemontools' envdir, envuidgid, setlock, setuidgid, softlimit and pgrphack, respectively. Besides UID and GID, s6-envuidgid also sets environment variable GIDLIST to the supplementary group list (as a comma separated list of group IDs) of its effective user, obtained using the POSIX getgrent() call. s6-setuidgid can also accept an argument of the form uid:gid with a numeric user and group ID as an alternative to an account database username. s6-setlock can also take a shared lock on a file (calling Linux flock() with a LOCK_SH operation on Gentoo) instead of an exclusive lock by invoking it with an -r option, and can take a timed lock (using a helper program, s6lockd-helper) by invoking it with a -t option followed by a time value in milliseconds specifying the timeout. s6-setsid can also make the process the leader of a new (background) process group without creating a new session (using POSIX setpgid()) by invoking it with a -b option, and can make the process the leader of a new process group in the same session and then attach the session's controlling terminal to the process group to make it the foreground group (using POSIX tcsetpgrp()) by invoking it with an -f or -g option. In the latter case, the process will ignore the resulting SIGTTOU signal, so that it doesn't get stopped. There is also a generalized version of s6-setuidgid, named s6-applyuidgid: s6-applyuidgid -u uid sets the user ID of the process to uid, s6-applyuidgid -g gid sets the group ID of the process to gid, s6-applyuidgid -G gidlist sets the supplementary group list of the process to gidlist (using Linux setgroups() on Gentoo), which must be a comma-separated list of numeric group IDs, without spaces, and s6-applyuidgid -U sets the user ID, group ID and supplementary group list of the process to the values of environment variables UID, GID and GIDLIST, respectively. For the full description of all these programs' functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6-svc is s6's program for controlling supervised processes, and s6-svstat, the program for querying status information about them. s6-svc accepts a service directory pathname, and options that specify what to do. Unlike daemontools' svc, any pathname after the first one will be ignored. The -u, -d, -o and -x options behave just like daemontools svc's; in particular, s6-svc -o is actually defined as the equivalent of s6-svc -uO. The -O (capital 'o') option behaves like the -o (small 'o') option, except that if the supervised process is not running, it won't be started. Other s6-svc options allow reliably sending signals to a supervised process, and interacting with s6-supervise's notification features. In particular, s6-svc -a can be used to send a SIGALRM signal to a supervised s6-log process to force it to perform a rotation. As of version 2.5.1.0, s6-svc can send a SIGKILL signal to kill a supervised process that has been stopped using the -d option, but still hasn't died after a specified timeout. This will happen if there is a regular file named timeout-kill in the corresponding service directory, containing a nonzero time value in milliseconds (as an unsigned integer) that specifies the timeout. A timeout-kill file containing 0 or an invalid value is equivalent to an absent timeout-kill. And as of version 2.7.2.0, s6-svc's -d option can make s6-supervise send a custom signal to the supervised process to stop it (following it with a SIGCONT signal), and s6-svc also accepts an -r option (restart) to send the same custom signal but without stopping the process: the corresponding finish file, if any, and then the run file, will be executed after process termination, as usual, unless an s6-svc -O or s6-svc -o command has been used before s6-svc -r. A regular file named down-signal can be placed in the corresponding service directory, containing a signal name or number, to specify the custom signal. An absent down-signal file, or one with invalid content, is equivalent to one containing SIGTERM (making s6-svc -r equivalent to s6-svc -t in this case). The down-signal mechanism is intended for daemons that use a signal other than SIGTERM as their default 'stop' command.

Warning
s6 versions 2.7.2.0 to 2.7.2.2 contain a bug that prevents s6-svc's -r option from being recognized, despite the restart feature being actually implemented. This has been fixed in version 2.8.0.0 and later.

s6-svstat accepts a service directory pathname and options. Unlike daemontools' svstat, any pathname after the first one will be ignored. Without options, or with only the -n option, s6-svstat prints a human-readable summary of all the available information on the service. In this case, it displays whether the supervised process is running ("run") or not ("down"), whether it is transitioning to the desired state or already there ("want up" or "want down"), how long it has been in the current state, and whether its current up or down status matches the presence or absence of a down file in the servicedir ("normally up" or "normally down"). It also shows if the supervised process is paused (because of a SIGSTOP signal). If the process is up, s6-svstat prints its process ID (PID), and, if it suports readiness notification, whether s6-supervise has been already notified ("ready") or not, and how much time has passed since the notification has been received. If the process is down, s6-svstat prints its exit status (signal name, or signal number if the -n option was given, if the supervised process was killed by a signal, or exit code otherwise), whether the really down event has happened or not, and how much time has passed since it did. When s6-svstat is invoked with options other than -n, it outputs programmatically parsable information instead, as a series of space-separated values, one value per requested field. For example, s6-svstat -p, or equivalently s6-svstat -o pid, will only print the supervised process' PID if run is being executed (or -1 if it isn't), and s6-svstat -ue, or equivalently s6-svstat -u -o exitcode or s6-svstat -o up,exitcode, will only print whether service is up or not ("true" or "false"), and the supervised process' exit code, or -1 if it is running or was killed by a signal.

s6 also provides an s6-svok program similar to daemontools' svok, that checks whether a s6-supervise process is currently running on a service directory specified as an argument. Its exit status is 0 if there is one, and 1 if there isn't. For the full description of s6-svc's and s6-svstat's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

As of version 2.5.0.0, if a service directory contains a finish file that exits with code 125 (indicating permanent failure), the supervised process won't be started, as if an s6-svc -O command naming the corresponding service directory had been used while the process was running. And as of version 2.7.1.0, s6-supervise keeps a record of supervised process terminations, the death tally, and new programs are available to use the recorded information. The s6-svdt program accepts a servicedir pathname, and prints the corresponding service's death tally. For each recorded process termination, on line is printed. Each line starts with a timestamp in external TAI64N format, and either contains the word "exitcode" followed by the supervised process' exit code if it exited, or the word "signal" followed by a signal name if it was killed by a signal. If an -s option is passed to s6-svdt, the numerical value of the signal will be printed instead of the signal name, and if an -n option followed by an unsigned integer value is passed to s6-svdt, only the last specified number of termination events are printed. The s6-svdt-clear program accepts a servicedir pathname, and clears the corresponding service's death tally. The maximum number of termination events recorded by s6-supervise can be customized by placing a regular file named max-death-tally in the corresponding service directory, containing an unsigned integer value that specifies this maximum number. This value cannot be greater than 4096. Every new termination event after the maximum number of recorded events is reached will discard the oldest one in the death tally. If max-death-tally is absent, a default maximum of 100 events is recorded.

s6-permafailon is a chain loading program that assumes that its working directory is a servicedir, and checks termination events in the corresponding death tally. If they match specified conditions, the program exits with code 125, otherwise, it executes the next program in the chain. This makes it suitable for use in finish files, to signal permanent failure to s6-supervise when certain conditions are met. It takes two unsigned integers as arguments, specifying a time value in seconds and an event count, followed by a comma-separated list of conditions. If the number of termination events with recorded times not earlier that the specified number of seconds in the past, that match any of the conditions, is greater or equal to the specified count, s6-permafailon exits with code 125. A condition can be specified as an integer between 0 and 255, representing an exit code, as two such integers separated by a hyphen ('-'), representing a range of exit codes, as a (case insensitive) signal name, such as "SIGTERM", or as the (case insensitive) word "SIG" immediately followed by a signal number, such as "SIG6" for SIGABRT. If the supervised process exits with the specified code, or with a code in the specified range, the corresponding termination event matches the first or second kind of condition, respectively, and if the supervised process is killed by the specified signal, the corresponding termination event matches the last two kinds of conditions. For example, an s6-permafailon 30 5 100,sigabrt prog1 command exits with code 125 if the supervised process has exited with code 100, or has been killed by a SIGABRT signal, 5 or more times in the last 30 seconds, and executes program prog1 otherwise. And an s6-permafailon 60 13 1-64,sig3 prog2 arg1 arg2 command exits with code 125 if the supervised process has exited with a code greater than or equal to 1 and less than or equal to 64, or has been killed by a SIGQUIT signal, 13 or more times in the last minute, and executes program prog2 with arguments arg1 arg2 otherwise.

For further information about s6-svdt, s6-svdt-clear and s6-permafailon please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Example s6 scan directory with down, finish, and timeout-kill files, as well as a symbolic link to a supervise directory elsewhere, and execline scripts:

user $ls -l * .s6-svscan
.s6-svscan:
total 4
-rwxr-xr-x 1 user user 20 Mar 24 10:00 finish

test-service1:
total 8
-rwxr-xr-x 1 user user 52 Mar 24 10:00 finish
-rwxr-xr-x 1 user user 32 Mar 24 10:00 run
lrwxrwxrwx 1 user user 24 Mar 24 10:00 supervise -> ../../external-supervise

test-service2:
total 12
-rw-r--r-- 1 user user  0 Mar 24 10:00 down
-rwxr-xr-x 1 user user 99 Mar 24 10:00 finish
-rwxr-xr-x 1 user user 76 Mar 24 10:00 run
-rw-r--r-- 1 user user  6 Mar 24 10:00 timeout-finish

test-service3:
total 12
-rwxr-xr-x 1 user user 75 Mar 24 10:00 finish
-rwxr-xr-x 1 user user 39 Mar 24 10:00 run
-rw-r--r-- 1 user user  6 Mar 24 10:00 timeout-kill
FILE .s6-svscan/finish
#!/bin/execlineb -P

This file is used for s6-svscan's finish procedure.

FILE test-service1/run
#!/bin/execlineb -P
test-daemon

This file allows executing a hipothetical test-daemon program as a supervised process.

FILE test-service1/finish
#!/bin/execlineb -P
s6-permafailon 10 2 SIGINT exit

This makes test-service1 fail permanently if test-daemon is killed by a SIGINT signal 2 or more times in 10 seconds or less.

FILE test-service2/run
#!/bin/execlineb -P
foreground { echo Starting test-service2/run }
sleep 10
FILE test-service2/finish
#!/bin/execlineb -S0
foreground { echo Executing test-service2/finish with arguments $@ }
sleep 10

Since the test-service2/finish script runs for more than 5 seconds, a timeout-finish file is needed to prevent the process from being killed by s6-supervise before it completes its execution.

FILE test-service2/timeout-finish
20000
FILE test-service3/run
#!/bin/execlineb -P
test-daemon-sighup

This file allows executing a hipothetical test-daemon-sighup program as a supervised process, that is assumed to use signal SIGHUP as its 'stop' command, instead of SIGTERM.

FILE test-service3/finish
#!/bin/execlineb -S0
echo Executing test-service3/finish with arguments $@
FILE test-service3/timeout-kill
10000

This makes s6-supervise send test-daemon-sighup a SIGKILL signal if it is still alive after 10 seconds have elapsed since an s6-svc -d command has been used to try stop the daemon.

Resulting supervision tree when s6-svscan is run on this scandir as a background process in an interactive shell, assuming it is the working directory (i.e. launched with s6-svscan &):

user $ps xf -o pid,ppid,pgrp,euser,args
 PID  PPID  PGRP EUSER    COMMAND
...
1476  1461  1476 user     -bash
1753  1476  1753 user      \_ s6-svscan
1754  1753  1753 user          \_ s6-supervise test-service3
1757  1754  1757 user          |   \_ test-daemon-sighup
1755  1753  1753 user          \_ s6-supervise test-service1
1758  1755  1758 user          |   \_ test-daemon
1756  1753  1753 user          \_ s6-supervise test-service2
...
Important
Since processes in a supervision tree are created using the POSIX fork() call, all of them will inherit s6-svscan's enviroment, which, in the context of this example, is the user's login shell environment. If s6-svscan is launched in some other way (see later), the environment will likely be completely different. This must be taken into account when trying to debug a supervision tree with an interactive shell.

Status of all services reported by s6-svstat in human-readable format:

user $for i in *; do printf "$i: `s6-svstat $i`\n"; done
test-service1: up (pid 1758) 47 seconds
test-service2: down (exitcode 0) 47 seconds, ready 47 seconds
test-service3: up (pid 1757) 47 seconds

Output when only the service state, PID, exit code and killing signal information is requested:

user $for i in *; do printf "$i: `s6-svstat -upes $i`\n"; done
test-service1: true 1758 -1 NA
test-service2: false -1 0 NA
test-service3: true 1757 -1 NA

This s6-svstat invocation is equivalent to s6-svstat -o up,pid,exitcode,signal $i. The PID is displayed as "-1" for test-service2 because it is in down state.

supervise subdirectory contents:

user $ls -l */supervise ../external-supervise
lrwxrwxrwx 1 user user   24 Mar 24 10:00 test-service1/supervise -> ../../external-supervise

../external-supervise:
total 4
prw------- 1 user user  0 Mar 24 10:05 control
-rw-r--r-- 1 user user  0 Mar 24 10:05 death_tally
-rw-r--r-- 1 user user  0 Mar 24 10:05 lock
-rw-r--r-- 1 user user 35 Mar 24 10:05 status

test-service2/supervise:
total 4
prw------- 1 user user  0 Mar 24 10:05 control
-rw-r--r-- 1 user user  0 Mar 24 10:05 death_tally
-rw-r--r-- 1 user user  0 Mar 24 10:05 lock
-rw-r--r-- 1 user user 35 Mar 24 10:05 status

test-service3/supervise:
total 4
prw------- 1 user user  0 Mar 24 10:05 control
-rw-r--r-- 1 user user  0 Mar 24 10:05 death_tally
-rw-r--r-- 1 user user  0 Mar 24 10:05 lock
-rw-r--r-- 1 user user 35 Mar 24 10:05 status

Messages sent by test-service2/run to s6-svscan's standard output when manually started:

user $s6-svc -u test-service2
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
...

Current death tally for test-service2:

user $s6-svdt test-service2
@400000005c9785c6127e81fc exitcode 0
@400000005c9785da1358c8b4 exitcode 0
@400000005c9785ee13c9e85b exitcode 0

The timestamps are in external TAI64N format; they can be displaed in human-readable format and local time using s6-tai64nlocal:

user $s6-svdt test-service2 | s6-tai64nlocal
2019-03-24 10:26:57.310280700 exitcode 0
2019-03-24 10:27:17.324585652 exitcode 0
2019-03-24 10:27:37.331999323 exitcode 0
user $s6-svstat test-service2
up (pid 2237) 5 seconds, normally down

After enough seconds have elapsed:

user $s6-svstat test-service2
down (exitcode 0) 6 seconds, want up

The output of s6-svdt, s6-svstat and test-service2/finish shows that test-service2/run exits each time with an exit code of 0. Reliably sending a SIGSTOP signal, and later a SIGTERM signal, to test-service2/run:

user $s6-svc -p test-service2
user $s6-svc -t test-service2
user $s6-svstat test-service2
up (pid 2312) 18 seconds, normally down, paused

The output of s6-svstat shows that test-service2/run is stopped indeed ("paused"), so SIGTERM doesn't have any efect yet. To resume the process a SIGCONT signal is needed:

user $s6-svc -c test-service2
Executing test-service2/finish with arguments 256 15
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
...

The output of test-service2/finish shows that after resuming execution, test-service2/run was killed by the SIGTERM signal that was awaiting delivery (signal 15), and since the process is supervised, s6-supervise restarts test-service2/run after test-service2/finish exits.

Messages sent by test-service2/run to s6-svscan's standard output when manually stopped:

user $s6-svc -d test-service2
Executing test-service2/finish with arguments 256 15

As shown by test-service2/finish, s6-supervise stopped test-service2/run by killing it with a SIGTERM signal (signal 15).

Sending two consecutive and sufficiently close SIGINT signals to test-daemon:

user $s6-svc -i test-service1
user $s6-svstat test-service1
up (pid 1799) 7 seconds
user $s6-svc -i test-service1
s6-permafailon: info: PERMANENT FAILURE triggered after 2 events involving signal 2 in the last 10 seconds

This shows that s6-permafailon's condition was triggered, so it exited with code 125. Because it was executed from test-service1/finish, this signals permanent failure to s6-supervise, so test-daemon is not restarted:

user $s6-svstat test-service1
down (signal SIGINT) 16 seconds, normally up, ready 16 seconds
user $s6-svdt test-service1 | s6-tai64nlocal
2019-03-24 10:39:42.918138981 signal SIGINT
2019-03-24 10:39:50.705347226 signal SIGINT

This shows test-daemon's two recorded termination events ("involving signal 2", i.e. SIGINT, as reported by s6-permafailon's message).

Manually stopping test-daemon-sighup:

user $s6-svc -d test-service3
user $s6-svstat test-service3
up (pid 1757) 137 seconds, want down
Executing test-service3/finish with arguments 256 9

The output of s6-svstat shows that test-daemon-sighup could not be stopped ("up" but also "want down") because it ignores SIGTERM. The service directory contains a timeout-kill file, so after waiting the specified 10 seconds, s6-supervise killed test-daemon-sighup with a SIGKILL signal (signal 9), as shown by test-service3/finish's message.

user $s6-svstat test-service3
down (signal SIGKILL) 14 seconds, normally up, ready 14 seconds

The output of s6-svstat confirms that test-daemon-sighup was killed by a SIGKILL signal. Output of s6-svstat when only the service state, PID, exit code and killing signal information is requested:

user $for i in *; do printf "$i: `s6-svstat -upes $i`\n"; done
test-service1: false -1 -1 SIGINT
test-service2: false -1 -1 SIGTERM
test-service3: false -1 -1 SIGKILL

This shows that no services are currently in up state ("false"), so their PIDs are displayed as "-1", and that the processes have been killed by signals, so their exit codes are displayed as "-1".

Creating a down-signal file in service directory test-service3, restarting test-daemon-sighup and then using an s6-svc -r command:

user $echo SIGHUP >test-service3/down-signal
user $s6-svc -u test-service3
user $s6-svc -r test-service3
test-daemon-sighup: Got SIGHUP, exiting...
Executing test-service3/finish with arguments 0 0
user $s6-svstat test-service3
up (pid 1760) 24 seconds

The output of s6-svstat and test-service3/finish shows that test-daemon-sighup exited normally with code 0, because s6-supervise sent it the required 'stop' signal (SIGHUP, as shown by test-daemon-sighup's message), and was then restarted it as usual. Stopping test-daemon-sighup with an s6-svc -d command:

user $s6-svc -d test-service3
test-daemon-sighup: Got SIGHUP, exiting...
Executing test-service3/finish with arguments 0 0
user $s6-svstat test-service3
down (exitcode 0) 84 seconds, normally up, ready 84 seconds

Again, as shown by the output of s6-svstat and test-service3/finish, test-daemon-sighup exited normally with code 0. Displaying its death tally:

user $s6-svdt test-service3 | s6-tai64nlocal
2019-03-24 10:52:12.326931580 signal SIGKILL
2019-03-24 10:53:56.611814401 exitcode 0
2019-03-24 10:54:32.035755511 exitcode 0

s6-svscan's finish procedure

When s6-svscan is asked to exit using s6-svscanctl, it tries to execute a file named finish, expected to be in the .s6-svscan control subdirectory of the scan directory. The program does this using the POSIX execve() call, so no new process will be created, and .s6-svscan/finish will have the same process ID as s6-svscan.

.s6-svscan/finish is invoked with a single argument that depends on how s6-svscanctl is invoked:

  • If s6-svscanctl is invoked with the -s option, .s6-svscan/finish will be invoked with a halt argument.
  • If s6-svscanctl is invoked with the -p option, .s6-svscan/finish will be invoked with a poweroff argument.
  • If s6-svscanctl is invoked with the -r option, .s6-svscan/finish will be invoked with a reboot argument.

This behaviour supports running s6-svscan as process 1. Just as run or finish files in a service directory, .s6-svscan/finish can have any file format that the kernel knows how to execute, but is usually an execline script. If s6-svscan is not running as process 1, the argument supplied to .s6-svscan/finish is usually meaningless and can be ignored. The file can be used just for cleanup in that case, and if no special cleanup is needed, it can be this minimal do-nothing execline script:

FILE .s6-svscan/finishMinimal execline finish script
#!/bin/execlineb -P

If no -s, -p or -r option is passed to s6-svscanctl, or if s6-svscan receives a SIGABRT, or if s6-svscan receives a SIGTERM, SIGTHUP or SIGQUIT signal and signal diversion is turned off, .s6-svscan/finish will be invoked with a 'reboot' argument.

If s6-svscan encounters a error situation it cannot handle, or if it is asked to exit and there is no .s6-svscan/finish file, it will try to execute a file named crash, also expected to be in the .s6-svscan control subdirectory. This is also done using execve(), so no new process will be created, and .s6-svscan/crash will have the same process ID as s6-svscan. If there is no .s6-svscan/crash file, s6-svscan will give up and exit with an exit code of 111.

s6-svscanctl can also be invoked in this abbreviated forms:

  • s6-svscanctl -0 (halt) is equivalent to s6-svscanctl -st.
  • s6-svscanctl -6 (reboot) is equivalent to s6-svscanctl -rt.
  • s6-svscanctl -7 (poweroff) is equivalent to s6-svscanctl -pt.
  • s6-svscanctl -8 (other) is equivalent to s6-svscanctl -0, but .s6-svscan/finish will be invoked with an 'other' argument instead of a 'halt' argument.
  • s6-svscanctl -i (interrupt) is equivalent to s6-svscanctl -6, and equivalent to sending s6-svscan a SIGINT signal, unless signal diversion is turned on.

Contents of the .s6-svscan subdirectory with example finish and crash files, once s6-svscan is running:

user $ls -l .s6-svscan
total 8
prw------- 1 user user  0 Jul 19 12:00 control
-rwxr-xr-x 1 user user 53 Jul 19 12:00 crash
-rwxr-xr-x 1 user user 72 Jul 19 12:00 finish
-rw-r--r-- 1 user user  0 Jul 19 12:00 lock
FILE .s6-svscan/finish
#!/bin/execlineb -S0
echo Executing .s6-svscan/finish with arguments $@
FILE .s6-svscan/crash
#!/bin/execlineb -S0
echo Executing .s6-svscan/crash

Messages sent by .s6-svscan/finish to s6-svscan's standard output as a result of different s6-svscanctl invocations:

user $s6-svscanctl -t .
Executing .s6-svscan/finish with arguments reboot
user $s6-svscanctl -st .
Executing .s6-svscan/finish with arguments halt
user $s6-svscanctl -7 .
Executing .s6-svscan/finish with arguments poweroff
user $s6-svscanctl -8 .
Executing .s6-svscan/finish with arguments other

Messages printed by s6-svscan on its standard error, and sent by .s6-svscan/crash to s6-svscan's standard output, as a result of invoking s6-svscanctl after deleting .s6-svscan/finish:

user $rm .s6-svscan/finish
user $s6-svscanctl -t .
s6-svscan: warning: unable to exec finish script .s6-svscan/finish: No such file or directory
s6-svscan: warning: executing into .s6-svscan/crash
Executing .s6-svscan/crash

s6-svscan's signal diversion feature

When s6-svscan is invoked with an -S option, or with neither an -s nor an -S option, and it receives a SIGINT, SIGHUP, SIGTERM or SIGQUIT signal, it behaves as if s6-svscanctl had been invoked with its scan directory pathname and an option that depends on the signal.

When s6-svscan is invoked with an -s option, signal diversion is turned on: if it receives any of the aforementioned signals, a SIGUSR1 signal, or a SIGUSR2 signal, s6-svscan tries to execute a file with the same name as the received signal, expected to be in the .s6-svscan control subdirectory of the scan directory (e.g. .s6-svscan/SIGTERM, .s6-svscan/SIGHUP, etc.). These files will be called diverted signal handlers, and are executed as a child process of s6-svscan. Just as run or finish files in a service directory, they can have any file format that the kernel knows how to execute, but are usually execline scripts. If the diverted signal handler corresponding to a received signal does not exist, the signal will have no effect. When signal diversion is turned on, s6-svscan can still be controlled using s6-svscanctl.

The best known use of this feature is to support the s6-rc service manager as an init system component when s6-svscan is running as process 1; see s6 and s6-rc-based init system.

Example .s6-svscan subdirectory with diverted signal handlers for SIGHUP, SIGTERM and SIGUSR1:

user $ls -l .s6-svscan
total 16
-rwxr-xr-x 1 user user 53 Jul 19 12:00 crash
-rwxr-xr-x 1 user user 72 Jul 19 12:00 finish
-rwxr-xr-x 1 user user 51 Jul 19 12:00 SIGHUP
-rwxr-xr-x 1 user user 52 Jul 19 12:00 SIGTERM
-rwxr-xr-x 1 user user 52 Jul 19 12:00 SIGUSR1
FILE .s6-svscan/SIGHUP
#!/bin/execlineb -P
echo s6-svscan received SIGHUP
FILE .s6-svscan/SIGTERM
#!/bin/execlineb -P
echo s6-svscan received SIGTERM
FILE .s6-svscan/SIGUSR1
#!/bin/execlineb -P
echo s6-svscan received SIGUSR1

Output of ps showing s6-svscan's process ID and arguments:

user $ps -o pid,args
 PID COMMAND
...
2047 s6-svscan -s
...

Messages printed to s6-svscan's standard output as a result of sending signals with the kill utility:

user $kill 2047
s6-svscan received SIGTERM
user $kill -HUP 2047
s6-svscan received SIGHUP
user $kill -USR1 2047
s6-svscan received SIGUSR1

Starting the supervision tree

From OpenRC

As of version 0.16, OpenRC provides a service script that can launch s6-svscan, also named s6-svscan. On Gentoo, the scan directory will be /run/openrc/s6-scan. This script exists to support the OpenRC-s6 integration feature, but can be used to just launch an s6 supervision tree when the machine boots by adding it to an OpenRC runlevel:

root #rc-update add s6-svscan default

Or it can also be started manually:

root #rc-service s6-svscan start
Note
The service script launches s6-svscan using OpenRC's start-stop-daemon program, so it will run unsupervised, and have its standard input, output and error redirected to /dev/null.

Because /run is a tmpfs, and therefore volatile, servicedir symlinks must be created in the scan directory each time the machine boots, before s6-svscan starts. The tmpfiles.d interface, which is supported by OpenRC using package opentmpfiles (sys-apps/opentmpfiles), can be used for this:

FILE /etc/tmpfiles.d/s6-svscan.conf
#Type Path Mode UID GID Age Argument
d /run/openrc/s6-scan
L /run/openrc/s6-scan/service1 - - - - /path/to/servicedir1
L /run/openrc/s6-scan/service2 - - - - /path/to/servicedir2
L /run/openrc/s6-scan/service3 - - - - /path/to/servicedir3

As an alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's default runlevel, by placing '.start' and '.stop' files in /etc/local.d (please read /etc/local.d/README for more details) that perform actions similar to those of the s6-svscan service script:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscan -- -S /path/to/scandir
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

The -S option will explicitly disable signal diversion so that the SIGTERM signal that start-stop-daemon sends to s6-svscan will make it act as if an s6-svscanctl -rt command had been used.

And as another alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's default runlevel, with /service as the scan directory, using a '.start' file that calls the s6-svscanboot script provided as an example (see starting the supervision tree from sysvinit), instead of s6-svscan directly. This allows setting up a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, provided a service directory for the logger exists in /service:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
# Remember to symlink /command to /bin
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscanboot
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

From sysvinit

The s6 package provides a script called s6-svscanboot, that can be launched and supervised by sysvinit by adding a respawn line for it in /etc/inittab[3]. It is an execline script that launches an s6-svscan process, with its standard output and error redirected to /service/s6-svscan-log/fifo. This allows setting up a FIFO and a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, with the the same technique used by s6 and s6-rc-based init systems. s6-svscan's standard input will be redirected to /dev/null. The enviroment will be emptied and then set according to the contents of environment directory /service/.s6-svscan/env, if it exists, with an s6-envdir invocation. The scan directory will be /service.

s6-svscanboot is provided as an example; it is the examples/s6-svscanboot file in the package's /usr/share/doc subdirectory. Users that want this setup will need to copy (and possibly uncompress) the script to /bin, manually edit /etc/inittab, and then call telinit:

FILE /etc/inittab
SV:12345:respawn:/bin/s6-svscanboot
root #telinit q

This will make sysvinit launch and supervise s6-svscan when entering runlevels 1 to 5. Because s6 and execline programs used in the script and invoked using absolute pathnames are asumed to be in directory /command, a symlink to the correct path for Gentoo must be created:

root #ln -s bin /command

An s6 service directory for the s6-svscan logger can be created with the s6-linux-init-maker program from package s6-linux-init:

root #s6-envuidgid user s6-linux-init-maker -l /service -U temp
root #cp -a temp/run-image/{service/s6-svscan-log,uncaught-logs} /service

The logger will be an s6-log process that logs to directory /service/uncaught-logs, prepending messages with a timestamp in external TAI64N format. Username user should be replaced by a valid account's username, to allow s6-log to run as an unprivileged process, and temp will be a temporary directory created by s6-linux-init-maker on the working directory, that can be removed once the necessary subdirectories are copied to /service.

The logging chain

A supervision tree where all leaf processes have a logger can be arranged into what the software package's author calls the logging chain[4], which he considers to be technically superior to the traditional syslog-based centralized approach[5].

Since processes in a supervision tree are created using the POSIX fork() call, each of them will inherit s6-svscan's standard input, output and error. A logging chain arrangement is as follows:

  • Leaf processes should normally have a logger, so their standard output and error connect to their logger's standard input. Therefore, all their messages are collected and stored in dedicated, per-service logs by their logger. Some programs might need to be invoked with special options to make them send messages to their standard error, and redirection of standard error to standard output (i.e. 2>&1 in a shell script or fdmove -c 2 1 in an execline script) must be performed in the servicedir's run file.
  • Leaf processes with a controlling terminal are an exception: their standard input, output and error connect to the terminal.
  • s6-supervise, the loggers, and leaf processes that exceptionally don't have logger for some reason, inherit their standard input, output and error from s6-svscan, so their messages are sent wherever the ones from s6-svscan are.
  • Leaf processes that still unavoidably report their messages using syslog() have them collected and logged by a (possibly supervised) syslog server.

s6 and s6-rc-based init systems are arranged in such a way that s6-svscan's messages are collected by a catch-all logger, and that logger's standard error is redirected to /dev/console.

The notification framework

Notification is a mechanism by which a process can become instantly aware that a certain event has happened, as opposed to the process actively and periodically checking whether it happened (which is called polling)[6]. The s6 package provides a general notification framework that doesn't rely on a long-lived process (e.g. a bus daemon), so that it can be integrated with its supervision suite. The notification framework is based instead on FIFO directories.

FIFO directories and related tools

A FIFO directory (or fifodir) is a directory in the filesystem asociated with a notifier, a process in charge of notifying other processes about some set of events. As the name implies, the directory contains FIFOs, each of them associated with a listener, a process that wants to be notified about one or more events. A listener creates a FIFO in the fifodir and opens it for reading, this is called subscribing to the fifodir. When a certain event happens, the notifier writes to each FIFO in the fifodir. Written data is conventionally a single character encoding the identity of the event. Listeners wait for notifications using some blocking I/O call on the FIFO; unblocking and successfully reading data from it is their notification. A listener that no longer wants to receive notifications removes its FIFO from the fifodir, this is called unsubscribing.

FIFOs and FIFO directories need a special ownership and permission setup to work. The owner of a fifodir must be the notifier's effective user. A publically accesible fifodir can be subscribed to by any user, and its permissions must be 1733 (i.e. the output of ls -l would display drwx-wx-wt). A restricted fifodir can be subscribed to only by members of the fifodir's group, and its permissions must be 3730 (i.e. the output of ls -l would display drwx-ws--T). The owner of a FIFO in the fifodir must be the corresponding listener's effective user, and its permissions must be 0622 (i.e. the output of ls -l would display prw--w--w-). Complete information about the FIFO directory internals is available here.

s6 provides an s6-mkfifodir program that creates a FIFO directory with correct ownership and permissions. It accepts the pathname of the fifodir. A restricted fifodir is created by specifying the -g option followed by a numeric group ID, which s6-mkfifodir's effective user must be a member of. s6-mkfifodir without a -g option creates a publically accesible fifodir. A fifodir can be removed with an rm -r command. There is also a s6-cleanfifodir program that accepts the pathname of a fifodir and removes all FIFOs in it that don't have an active listener. Its effective user must be a member of the fifodir's group. In the normal case FIFOs are removed when the corresponding listener unsubscribes, so s6-cleanfifodir is a cleanup tool for cases when this fails (e.g. the listener was killed by a signal). For further information about s6-mkfifodir and s6-cleanfifodir please consult the HTML documentation in the package's /usr/share/doc subdirectory.

The s6-ftrig-notify program allows notifying all subscribers of a fifodir, so it can be used to create a notifier program. It accepts the pathname of a fifodir and a message that is written as-is to all FIFOs in the fifodir. Each character in the message is assumed to encode an event, and the character sequence should reflect the events sequence. The s6-ftrig-wait program allows subscription to a fifodir and waiting for a notification, so it can be used to create a listener program. It accepts the pathname of a fifodir and a POSIX extended regular expression (i.e those accepted by the grep -E command), creates a FIFO in the fifodir with correct ownership and permissions, and waits until it reads a sequence of characters that match the regular expression. Then it unsubcribes from the fifodir by removing the FIFO, prints the last character read from it to its standard output, and exits. For further information about s6-ftrig-notify and s6-ftrig-wait please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Because performing an action that might trigger an event recognized by a notifier, and subscribing to its fifodir to be notified of the event is susceptible to races that might lead to missing the notification, s6 provides two additional programs, s6-ftrig-listen and s6-ftrig-listen1. s6-ftrig-listen is a program that accepts options, a set of fifodir pathname and extended regular expression pairs, a program name and its arguments. It subscribes to each specified fifodir, runs the program as a child process with the supplied arguments, and waits for notifications. It makes sure that the program is executed after there are listeners reading from their FIFOs.

s6-ftrig-listen expects its arguments to be in the format execline's execlineb program generates when parsing the block syntax, so the forward compatible way to use it is in an execline script or execlineb -c command: the invocation can be written using a the syntax s6-ftrig-listen { f1 re1 f2 re2 ... } prog args, where f1, f2, ... are the fifodir pathnames, re1, re2, ... are the regular expressions corresponding to f1, f2, ..., respectively, prog is the program name and args, the program's arguments. If s6-ftrig-listen is invoked with an -o option (or), it will unsubscribe from all fifodirs and exit when it reads a matching sequence of characters from any of the created FIFOs. If s6-ftrig-listen is invoked without an -o option, or with an explicit -a option (and), it will wait until it reads a matching sequence from every FIFO. The s6-ftrig-listen1 program is a single fifodir and regular expression version of s6-ftrig-listen that doesn't need execlineb-encoded arguments, and that prints the last character read from the created FIFO to its standard output. For further information about s6-ftrig-listen and s6-ftrig-listen1 please consult the HTML documentation in the package's /usr/share/doc subdirectory.

A timeout can be set for s6-ftrig-wait, s6-ftrig-listen and s6-ftrig-listen1 by specifying a -t option followed by a time value in milliseconds. The programs exit with an error status it they haven't been notified about the desired events after the specified time.

The fifodir and notification management code are implemented in the s6 package's library, libs6, and an internal helper program, s6-ftrigrd. The library exposes a public C language API than can be used by programs; for details about the API for notifiers see here, and for details about the API for listeners see here. s6-ftrigrd is launched by the library code.

Creating a publically accesible fifodir named fifodir1 and a fifodir restricted to members of group user (assumed to have group ID 1000) named fifodir2:

user $s6-mkfifodir fifodir1
user $s6-mkfifodir -g 1000 fifodir2
user $ls -ld fifodir*
drwx-wx-wt 2 user user 4096 Aug  2 12:00 fifodir1
drwx-ws--T 2 user user 4096 Aug  2 12:00 fifodir2

Creating listeners that subscribe to fifodir1 and wait for event sequences 'message1' and 'message2', respectively, as background processes:

user $s6-ftrig-wait fifodir1 message1 &
user $s6-ftrig-wait -t 20000 fifodir1 message2 &
user $ls -l fifodir1
total 0
prw--w--w- 1 user user 0 Aug  2 21:44 ftrig1:@40000000598272220ea9fa39:-KnFNSkhmW1pQPY0
prw--w--w- 1 user user 0 Aug  2 21:46 ftrig1:@400000005982728b3a8d09c2:_UjWhNPn3Z0Q_VFQ

This shows that a FIFO has been created in the fifodir for each s6-ftrig-wait process, with names starting with 'ftrig1:'.

user $ps f -o pid,ppid,args
 PID  PPID COMMAND
...
2026  2023 \_ bash
2043  2026     \_ s6-ftrig-wait fifodir1 message1
2044  2043     |   \_ s6-ftrigrd
2051  2026     \_ s6-ftrig-wait -t 20000 fifodir1 message2
2052  2051         \_ s6-ftrigrd
...
s6-ftrig-wait: fatal: unable to match regexp on message2: Connection timed out

The output of ps shows that each s6-ftrig-wait process has spawned a child s6-ftrigrd helper, and because the one waiting for event sequence 'message2' has a timeout of 20 seconds ("-t 20000"), after that time has elapsed whithout getting the expected notifications it unsubscribes, and exits with an error status that is printed on the shell's terminal ("Connection timed out").

user $ls -l fifodir1
total 0
prw--w--w- 1 user user 0 Aug  2 21:44 ftrig1:@40000000598272220ea9fa39:-KnFNSkhmW1pQPY0

This shows that the s6-ftrig-wait process without a timeout is still running, and its FIFO is still there. Notifying all fifodir1 listeners about event sequence 'message1':

user $s6-ftrig-notify fifodir1 message1
1

The '1' printed on the shell's terminal after the s6-ftrig-notify invocation is the last event the s6-ftrig-wait process was notified about (i.e. the last character in string 'message1'), which then exits because the notifications have matched its regular expression.

user $ls -l fifodir1
total 0

This shows that since all listeners have unsubscribed, the fifodir is empty.

FILE test-scriptExample execline script for testing s6-ftrig-listen
#!/bin/execlineb -P
foreground {
   s6-ftrig-listen -o { fifodir1 message fifodir2 message }
   foreground { ls -l fifodir1 fifodir2 }
   foreground { ps f -o pid,ppid,args }
   s6-ftrig-notify fifodir1 message
}
echo s6-ftrig-listen exited

Executing the example script:

user $./test-script
fifodir1:
total 0
prw--w--w- 1 user user 0 Aug  2 22:28 ftrig1:@4000000059827c60124f916d:51Xhg7STswW-yFst

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:28 ftrig1:@4000000059827c601250c752:oXikN3Vko3JipuvU
 PID  PPID COMMAND
...
2176  2026 \_ foreground  s6-ftrig-listen ...
2177  2176     \_ s6-ftrig-listen -o  fifodir1 ...
2178  2177         \_ s6-ftrigrd
2179  2177         \_ foreground  ps ...
2181  2179             \_ ps f -o pid,ppid,args
...
s6-ftrig-listen exited

The output of ls shows that two listeners were created, one subscribed to fifodir1 and the other to fifodir2, and the output of ps shows that both are implemented by a single s6-ftrigrd process that is a child of s6-ftrig-listen. It also shows that s6-ftrig-listen has another child process, executing (at that time) the execline foreground program, which in turn has spawned the ps process. After that, foreground replaces itself with s6-ftrig-notify, which notifies all fifodir1 listeners about event sequence 'message'. Because s6-ftrig-listen was invoked with an -o option, and the fifodir1 listener got notifications that match its regular expression, s6-ftrig-listen exits at that point ("s6-ftrig-listen exited").

user $ls fifodir*
fifodir1:
total 0

fifodir2:
total 0

This shows that the listener subscribed to fifodir2 has unsubscribed and exited, even if it didn't get the expected notifications.

Modifying the test script to invoke s6-ftrig-listen with the -a option instead (i.e. as s6-ftrig-listen -a { fifodir1 message fifodir2 message }) and reexecuting it in the background:

user $./test-script &
fifodir1:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e4210384d5:wikPBCD-Aw5Erijp

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e42104bc57:Yop6JbMNBJo1r-uI
 PID  PPID COMMAND
...

The output of the script does not have a "s6-ftrig-listen exited" message, so it is still running:

user $ls -l fifodir*
fifodir1:
total 0

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e42104bc57:Yop6JbMNBJo1r-uI

This confirms that the listener subscribed to fifodir2 is still running, waiting for events. Notifying all fifodir2 listeners about event sequence 'message':

user $s6-ftrig-notify fifodir2 message
s6-ftrig-listen exited

This shows that once the remaining listener has gotten notifications that match its regular expression, s6-ftrig-listen exits.

The process supervision suite's use of notification

The event subdirectory of an s6 service directory is a fifodir used by s6-supervise to notify interested listeners about its supervised process' state changes. That is, s6-supervise acts as the notifier associated with the event fifodir, and writes a single character to each FIFO in it when there is a state change:

  • At program startup, after creating event if it doesn't exist, s6-supervise writes an s character (start event).
  • Each time s6-supervise spawns a child process executing the run file, it writes a u character (up event).
  • If the supervised process supports readiness notification, s6-supervise writes a U character (up and ready event) when the child process notifies its readiness.
  • If the service directory contains a finish file, and, when executed, exits with exit code 125 (permanent failure), s6-supervise writes an O character (once event, the character is a capital 'o').
  • Each time the supervised process stops running, s6-supervise writes a d character (down event).
  • If the service directory contains a finish file, s6-supervise writes a D character (really down) each time finish exits or is killed. Otherwise, s6-supervise writes the character right after the down event notification.
  • When s6-supervise is about to exit normally, it writes an x character (exit event) after the supervised process stops and it has notified listeners about the really down event.

s6 provides an s6-svwait program, that is a process supervision-specific notification tool. It accepts service directory pathnames and options that specify an event to wait for. At program startup, for each specified servicedir it checks the status file in its supervise control subdirectory to see if the corresponding supervised process is already in the state implied by the specified event, and if not, it subscribes to the event fifodir and waits for notifications from the corresponding s6-supervise process. A -u option specifies an up event, a -U option, an up and ready event, a -d option, a down event, and a -D option, a really down event. Options -a and -o work as for s6-ftrig-listen.

There is also an s6-svlisten program, that is a process supervision-specific version of s6-ftrig-listen. It accepts servicedir pathnames in the format execline's execlineb program generates when parsing the block syntax, a program name and its arguments, and options that specify an event to wait for. Therefore, the forward compatible way to use it is in an execline script or execlineb -c command: the invocation can be written using the syntax s6-svlisten { s1 s2 ... } prog args, where s1, s2, ... are the servicedir pathnames, prog is the program name and args are the program's arguments. Options -u, -U, -d and -D work as for s6-svwait. Options -a and -o work as for s6-ftrig-listen. s6-svlisten also accepts an -r option (restart event) that makes it wait for a down event followed by an up event, and a -R option (restart and ready event) that makes it wait for a down event followed by an up and ready event. The s6-svlisten1 program is a single servicedir version of s6-svlisten that doesn't need execlineb-encoded arguments.

s6-svwait, s6-svlisten and s6-svlisten1 accept a -t option to specify a timeout in the same way as s6-ftrig-wait. For further information about these programs please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Finally, the s6-svc program accepts a -w option that makes it wait for notifications from the s6-supervise process corresponding to the service directory specified as argument, after asking it to perform an action on its child process. An s6-svc -wu, s6-svc -wU, s6-svc -wd, s6-svc -wD, s6-svc -wr or s6-svc -wR command is equivalent to an s6-svlisten1 -u, s6-svlisten1 -U, s6-svlisten1 -d, s6-svlisten1 -D, s6-svlisten1 -r or s6-svlisten1 -R command, respectively, specifying the same servicedir, and s6-svc with the same arguments except for the -w option, as the spawned program. s6-svc also accepts a timeout specified with a -T option, that is translated to the s6-svlisten1 -t option.

See the service readiness notification section for usage examples.

Service readiness notification

When a process is supervised, it transitions to the 'up' state when its supervisor has successfully spawned a child process executing the run file. s6-supervise considers this an up event, and notifies all listeners subscribed to the corresponding event fifodir about it. But when the supervised process is executing a server program for example, it might not be ready to provide its service immediately after startup. Programs might do initialization work that could take some noticeable time before they are actually ready to serve, but it is impossible for the supervisor to know exactly how much. Because of this, and because the kind of initialization to do is program-specific, some sort of collaboration from the supervised process is needed to help the supervisor know when it is ready[7]. This is called readiness notification.

systemd has the concept of readiness notification, called start-up completion notification in its documentation. To support readiness notification under systemd, a program implements the $NOTIFY_SOCKET protocol, based on message passing over a datagram mode UNIX domain socket, bound to a pathname specified as the value of the NOTIFY_SOCKET environment variable. The protocol is implemented by libsystemd's sd_..._notify...() family of functions, although it is covered by systemd's interface stability promise, so it is possible to have alternative implementations of it. The program can perform start-up completion notification by linking to libsystemd and calling one of those functions. systemd uses start-up completion notification when a service unit file contains a 'Type=notify' directive.

To support readiness notification under s6, a program implements the s6 readiness notification protocol, which works like this:

  1. At program startup, the program expects to have a file descriptor open for writing, associated with a notification channel. The program chooses the file descriptor. For example, it can be specified as a program argument, or be a fixed, program-specific well-know number specified in the program's documentation.
  2. When all initialization work necessary to reach the program's definition of 'service ready state' has been completed, it writes a newline character to the notification channel.
  3. The program closes the notification channel after writing to it.

Therefore, a typical code snippet in the C language that implements the last two steps would be as follows:

CODE
/* notification_fd is an int object storing the notification channel's file descriptor */
write(notification_fd, "\n", 1);
close(notification_fd);

The code only relies on POSIX calls, so the program doesn't need to link to any specific library other than the libc to implement the readiness protocol. s6 uses readiness notification when a regular file named notification-fd is present in a service directory, containing an integer that specifies the program's chosen notification channel file descriptor. s6-supervise implements the notification channel as a pipe between the supervised process and itself; when it receives a newline character signalling the service's readiness, it considers that an up and ready event and notifies all listeners subscribed to the event fifodir about it. After that, s6-supervise no longer reads from the notification pipe, so it can be safely closed by the child process.

As of s6 version 2.7.2.0, s6-svscan itself supports the readiness nofication protocol: a -d option followed by an unsigned integer value can be passed to it, specifying the notification channel's file descriptor. s6-svscan will use the notification channel to signal readiness when it has initialized all its necessary resources, and is ready to perform the first scan of the supplied scan directory, and ready to receive commands from s6-svscanctl and to react to signals. This can be useful when s6-svscan is launched as a supervised process that is part of a supervision tree. Typically, to create an s6 subtree, and often to run s6-svscan with an unprivileged user (so that all subtree processes do so as well). When the -d option is not used, s6-svscan does not signal readiness.

Note
Using s6-svscan's -d option signals shallow readiness: s6-svscan's readiness does not mean that all the supervision tree processes launched by it are themselves ready, or even started, or even that their corresponding s6-supervise parent processes have been started. Therefore, this option cannot be relied on if a test for deep readiness, meaning that all supervision tree processes have been started and are ready, is needed.

Example s6 scan directory containing services that support readiness notification:

user $s6-mkfifodir test-service1/event
user $ls -l *
test-service1:
total 12
-rw-r--r-- 1 user user    0 Jul 30 12:00 down
drwx-wx-wt 2 user user 4096 Jul 30 12:00 event
-rwxr-xr-x 1 user user   29 Jul 30 12:00 finish
-rwxr-xr-x 1 user user   32 Jul 30 12:00 run

test-service2:
total 8
-rw-r--r-- 1 user user  0 Jul 30 12:00 down
-rw-r--r-- 1 user user  2 Jul 30 12:00 notification-fd
-rwxr-xr-x 1 user user 39 Jul 30 12:00 run

test-service3:
total 16
-rw-r--r-- 1 user user  0 Jul 30 12:00 down
-rwxr-xr-x 1 user user 29 Jul 30 12:00 finish
-rw-r--r-- 1 user user  2 Jul 30 12:00 notification-fd
-rwxr-xr-x 1 user user 39 Jul 30 12:00 run
-rw-r--r-- 1 user user  6 Jul 30 12:00 timeout-finish
FILE test-service1/run
#!/bin/execlineb -P
test-daemon
FILE test-service1/finish
#!/bin/execlineb -P
exit 125
FILE test-service2/run
#!/bin/execlineb -P
test-daemon --s6=5
FILE test-service2/notification-fd
5
FILE test-service3/run
#!/bin/execlineb -P
test-daemon --s6=5
FILE test-service3/notification-fd
5
FILE test-service3/finish
#!/bin/execlineb -P
sleep 10
FILE test-service3/timeout-finish
20000

It is assumed that test-daemon is a program that supports an --s6 option to turn readiness notification on, specifying the notification channel's file descriptor (5), which is also stored in a notification-fd file. test-service1/finish exits with an exit code of 125, so that if the corresponding test-daemon process stops, it won't be restarted. The s6-mkfifodir invocation creates test-service1/event as a publically accesible fifodir. Using s6-ftrig-listen1 on it to start the supervision tree and verify that s6-supervise notifies listeners about the start event:

user $s6-ftrig-listen1 test-service1/event s s6-svscan
s
user $ls -ld */event
drwx-wx-wt 2 user user 4096 Jul 30 12:22 test-service1/event
drwx-ws--T 2 user user 4096 Jul 30 12:22 test-service2/event
drwx-ws--T 2 user user 4096 Jul 30 12:22 test-service3/event

This shows that s6-supervise has created all missing event directories as restricted fifodirs, but uses the publicly accessible one created by s6-mkfifodir.

FILE test-scriptExample execline script for testing s6-svwait
#!/bin/execlineb -P
foreground { s6-svwait -u test-service1 }
echo s6-svwait exited

Executing the example script:

user $../test-script &
user $ps xf -o pid,ppid,args
 PID  PPID COMMAND
...
2166  2039 \_bash
2387  2166    \_ foreground  s6-svwait ...
2388  2387        \_ s6-svwait -u test-service1
2389  2388            \_ s6-ftrigrd
...
user $ls -l test-service1/event
total 0
prw--w--w- 1 user user 0 Jul 30 12:22 ftrig1:@40000000597df9d12c8328da:v84Zc_E_LyaqxlDh

This shows that the s6-svwait process has spawned a child s6-ftrigrd helper, and created a FIFO in test-service1/event so that it can be notified about the up event. Manually starting test-service1/run:

user $s6-svc -u test-service1
s6-svwait exited

The message printed by the test script to its standard output shows that the s6-svwait process got the expected notification, so it exited.

FILE test-scriptExample execline script for testing up and ready event notifications
#!/bin/execlineb -P
define -s services "test-service2 test-service3"
foreground {
   s6-svlisten -U { $services }
   foreground {
      forx svc { $services }
         importas svc svc
         foreground { s6-svc -wu -u $svc }
         pipeline { echo s6-svc -wu -u $svc exited } s6-tai64n
   }
   ps xf -o pid,ppid,args
}
pipeline { echo s6-svlisten -U exited } s6-tai64n

The script calls s6-svlisten to subscribe to fifodirs test-service2/event and test-service3/event and wait for up and ready events. Then it uses a s6-svc -wu -u command to manually start test-service2/run and test-service3/run, and wait for up events. Both run scripts invoke test-daemon with readiness notification on. A message timestamped using s6-tai64n is printed to the standard output when the listeners get their expected notifications. Executing the example script:

user $../test-script | s6-tai64nlocal
2017-07-30 19:45:38.458536857 s6-svc -wu -u test-service2 exited
2017-07-30 19:45:38.467353962 s6-svc -wu -u test-service3 exited
 PID  PPID COMMAND
2379  2378 \_ foreground  s6-svlisten  -U ...
2381  2379     \_ s6-svlisten -U ...
2382  2381         \_ s6-ftrigrd
2383  2381         \_ ps xf -o pid,ppid,args
2017-07-30 19:45:48.472237201 s6-svlisten -U exited

This shows that the s6-svc processes waiting for up events are notified first, so they exit, and that the s6-svlisten process waiting for up and ready events is notified 10 seconds later. The output of ps shows that when the s6-svc processes exited, the s6-svlisten process and its s6-ftrigrd child were still running.

user $for i in *; do printf "$i: `s6-svstat $i`\n"; done
test-service1: up (pid 2124) 42 seconds, normally down
test-service2: up (pid 2332) 29 seconds, normally down, ready 19 seconds
test-service3: up (pid 2338) 29 seconds, normally down, ready 19 seconds

This confirms that both test-daemon processes have notified readiness to their s6-supervise parent ("ready 19 seconds") 10 seconds after being started. Using s6-ftrig-listen1 on fifodir test-service1/event to verify that s6-supervise notifies listeners about a once event when test-daemon is killed with a SIGTERM, because of test-service1/finish's exit code:

user $s6-ftrig-listen1 test-service1/event O s6-svc -t test-service1
O
FILE test-scriptExample execline script for testing really down event notifications
#!/bin/execlineb -P
define -s services "test-service2 test-service3"
foreground {
   s6-svlisten -d { $services }
   forx svc { $services }
      importas svc svc
      foreground { s6-svc -wD -d $svc }
      pipeline { echo s6-svc -wD -d $svc exited } s6-tai64n
}
foreground {
   pipeline { echo s6-listen -d exited } s6-tai64n
}
ps xf -o pid,ppid,args

The script calls s6-svlisten to subscribe to fifodirs test-service2/event and test-service3/event and wait for down events. Then it uses a s6-svc -wD -d command to manually stop the test-daemon processes corresponding to test-service2 and test-service3, and wait for really down events. test-service3 has a finish script that sleeps for 10 seconds, so test-service2/event listeners should be notified earlier than test-service3/event listeners. A message timestamped using s6-tai64n is printed to the standard output when the listeners get their expected notifications. Executing the example script:

user $../test-script | s6-tai64nlocal
2017-07-30 22:23:17.063815232 s6-svc -wD -d test-service2 exited
2017-07-30 22:23:17.071855769 s6-listen -d exited
 PID  PPID COMMAND
2326     1 forx svc  test-service2  test-service3 ...
2333  2326  \_ foreground  s6-svc  -wD  ...
2334  2333      \_ s6-svlisten1 -D -- test-service3 s6-svc -d -- test-service3
2335  2334          \_ s6-ftrigrd
2017-07-30 22:23:27.078874158 s6-svc -wD -d test-service3 exited

This shows that the s6-svlisten process waiting for down events and the s6-svc process subscribed to test-service2/event and waiting for a really down event are notified first with almost no delay between them, so they exit, and that the s6-svc process subscribed to test-service3/event and waiting for a really down event is notified 10 seconds later. The output of ps shows that when the s6-svlisten process exited, an s6-svc process that had replaced itself with s6-svlisten1 (because of the -w option) and its s6-ftrigrd child were still running.

user $for i in *; do printf "$i: `s6-svstat $i`\n"; done
test-service1: down (signal SIGTERM) 83 seconds, ready 83 seconds
test-service2: down (exitcode 0) 31 seconds, ready 31 seconds
test-service3: down (exitcode 0) 31 seconds, ready 21 seconds

This confirms that the test-daemon process corresponding to test-service1 hasn't been restarted after test-service1/finish exited (83 seconds in down state and no 'wanted up'), and that the down and ready events for the test-daemon processes corresponding to test-service2 and test-service3 have a 10 seconds delay between them ("ready 21 seconds" compared to "ready 31 seconds"). Using s6-ftrig-listen1 on fifodir test-service2/event to stop the supervision tree and verify that s6-supervise notifies listeners about the exit event:

user $s6-ftrig-listen1 test-service2/event x s6-svscanctl -t .
x

s6-notifyoncheck

As of version 2.6.1.0, s6 provides the s6-notifyoncheck program, which can be used in combination with programs that don't support readiness notificaton, but can be polled for readiness somehow. In that case, s6-notifyoncheck can be invoked from a run file, use the available polling mechanisms, and signal readiness itself to s6-supervise using the s6 readiness notification protocol.

s6-notifyoncheck is a chain loading program that assumes its working directory is a servicedir, spawns a child process that polls for readiness, and then executes the next program in the chain. By default, the child process will try to execute a file named check in a subdirectory named data as a child process (i.e. the pathname of the file, relative to s6-notifyoncheck's working directory, is data/check). Just like run or finish, check can have any file format that the kernel knows how to execute, but is usually an execline or shell script. When executed, check is expected to poll the supervised process for readiness, and then exit with code 0 if the process was verified to be ready, or exit with a nonzero code otherwise. This is similar to runit's check file mechanism. If a -c option is passed to s6-notifyoncheck, instead of looking for a check file, the child process will invoke execlineb, the script parser and launcher from the execline package, and pass it the -c option and the argument that follows it. For example, s6-notifyoncheck -c eargs prog arg1 arg2 will spawn a child process that uses an execlineb -c eargs command to poll for readiness (so eargs can have execline syntax), while it executes program prog with arguments arg1 arg2 (without creating a new process). This option is mainly useful if the program used to poll the supervised process is very simple and can be inlined as a simple command line, to avoid having to manage a whole script and a check file.

By default, s6-notifyoncheck expects to be able to read the file descriptor it should use for the notification channel from a notification-fd file in its working directory (i.e. the file used by s6-supervise), and uses a single POSIX fork() call to create the poller child process, so the next program in the chain must be able to reap it when it terminates (e.g. with a POSIX wait() call). If a -3 option followed by an unsigned integer value is passed to s6-notifyoncheck, it will use the specified value as the notification channel's file descriptor, ignoring notification-fd. And if a -d option is passed to s6-notifyoncheck, it will use two fork() calls instead of one (i.e. it will double fork), so the poller process will be reparented to process 1 (or to a local reaper), and the next program in the chain won't have to reap it. This is useful to avoid having a lingering zombie process if the next program in the chain does not reap child processes it doesn't know of.

If the poll for readiness is successful, s6-notifyoncheck's poller process signals readiness using the notification channel. Unlike runit's sv command, which executes the check file only once, s6-notifyoncheck's poller process periodically retries execution of check or invocation of execlineb until a poll is successful, until a timeout period expires, or until a certain number of unsuccessful polls has been reached, depending on the options supplied to s6-notifyoncheck, and then exits. A retry is performed once every second by default, i.e. the default polling period is 1 second, but a different one can be specified by passing a -w option to s6-notifyoncheck, followed by a time value in milliseconds. Execution of the data/check or execlineb process can be limited by passing a -t option to s6-notifyoncheck, followed by a time value in milliseconds. If the process runs for longer than the specified time, s6-notifyoncheck's poller process sends it a SIGTERM signal to kill it, and then exits without signalling readiness. If the -t option is not used, s6-notifyoncheck's poller process will wait indefinitely for the data/check or execlineb process to exit.

For the full description of s6-notifyoncheck please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Example scan directory with a servicedir for a program that can be polled for readiness:

user $ls -l * .s6-svscan
.s6-svscan:
total 4
-rwxr-xr-x 1 user user 20 Apr  7 12:00 finish

test-service:
total 16
drwxr-xr-x 2 user user 4096 Apr  7 12:00 data
-rw-r--r-- 1 user user    0 Apr  7 12:00 down
drwxr-xr-x 2 user user 4096 Apr  7 12:00 env
-rw-r--r-- 1 user user    2 Apr  7 12:00 notification-fd
-rwxr-xr-x 1 user user   92 Apr  7 12:00 run
user $ls -l test-service/{data,env}
test-service/data:
total 4
-rwxr-xr-x 1 user user 148 Apr  7 12:00 check

test-service/env:
total 4
-rw-r--r-- 1 user user 2 Apr  7 12:00 ATTEMPTS
FILE test-service/run
#!/bin/execlineb -P
s6-envdir env
importas A ATTEMPTS
s6-notifyoncheck -d -n $A
test-daemon

This file allows executing a hipothetical test-daemon program as a supervised process. s6-notifyoncheck is used to poll the process for readiness. A -d option is used to make s6-notifyoncheck double fork, because test-daemon is assumed to not reap child processes, and the -n option is used to make s6-notifyoncheck's poller process retry execution of data/check after each unsuccessful poll, until the number of retries equals the value of the ATTEMPTS environment variable. The environment of the run process is modified by the contents of the env environment directory using an s6-envdir invocation.

FILE test-service/data/check
#!/bin/execlineb -P
foreground { printf "Polling test-daemon: " }
ifte {
echo success
} {
foreground { echo failure }
exit 1
}
test-daemon-check

This file allows executing a hipothetical test-daemon-check program that is assumed to be able to poll test-daemon for readiness. Messages are printed to the check process' standard output to report the outcome.

FILE test-service/notification-fd
3

This file informs s6-supervise that run supports the s6 readiness notification protocol using file descriptor 3 for the notification channel, and will also be used by s6-notifyoncheck.

FILE test-service/env/ATTEMPTS
5

Environment directory env sets the value of environment variable ATTEMPTS to 5. This means that after 5 executions of test-daemon-check that result in a nonzero exit code (i.e. if test-daemon is determined to not be ready when polled), s6-notifyoncheck's poller process will exit without signalling readiness to test-daemon's s6-supervise parent.

Starting test-service and waiting for the supervised process to be ready, using s6-svc's -w option:

user $time s6-svc -uwU -T 12000 test-service
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
s6-svlisten1: fatal: timed out

real	0m12.015s
user	0m0.002s
sys	0m0.001s

This shows that test-daemon was unsuccessfully polled 5 times, and because of s6-notifyoncheck's -n option, it exited without signalling readiness to s6-supervise. Therefore, after the 12 seconds specified with the -T option, s6-svc timed out, as shown by s6-svlisten1's error message and the output of the time utility.

user $s6-svstat test-service
up (pid 4600) 16 seconds, normally down

This shows that test-daemon is actually running ("up"), but s6-supervise does not consider the process to be ready (there is no "ready" in s6-svstat's output). Modifying the env directory so that the value of environment variable ATTEMPTS is 0, restarting test-service and waiting for the supervised process to be ready:

user $echo 0 >test-service/env/ATTEMPTS
user $time s6-svc -rwU -T 12000 test-service
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: failure
Polling test-daemon: success

real	0m10.116s
user	0m0.000s
sys	0m0.003s

Because of the new value of ATTEMPTS, s6-notifyoncheck was invoked with an -n 0 option, which tells it to keep executing data/check until there is a successful poll. This shows that test-daemon was polled for readiness once per second until the 11th attempt, which was successful. The output of of the time utility shows that this happend after approximately 10 seconds, i.e. before s6-svc's timeout of 12 seconds, which must mean it received an up and ready notification from test-daemon's supervisor.

user $s6-svstat test-service
up (pid 4634) 16 seconds, normally down, ready 6 seconds

This confirms that test-daemon up and ready. It became ready approximately 10 seconds after it was started.

The UNIX domain super-server and related tools

See here.

Suidless privilege gain tools

s6 provides two programs, s6-sudoc and s6-sudod, that can be used to implement controlled privilege gains without setuid programs. This is achieved by having s6-sudod run as a long-lived process with an effective user that has the required privileges, and bound to a stream mode UNIX domain socket, and having s6-sudod, which can run with an unprivileged effective user, ask the s6-sudod process over a connection to its socket to perform an action on its behalf.

s6-sudod is a program that must be spawned by a UCSPI server (like s6-ipcserverd) and accepts options and an argument sequence s1, s2, ... that can be empty. s6-sudoc is a program that must be spawned by a UCSPI client and accepts options and an argument sequence c1, c2, ... that can also be empty. s6-sudoc transmits the argument sequence over the connection to the server, that must be an s6-sudod process, and its environment variables, unless it is invoked with an -e option. s6-sudod concatenates its argument sequence with the one received from the client, and passes it to a POSIX execve() call, which results in a program invocation. s6-sudoc also transmits its standard input, output and error file descriptors to s6-sudod using SCM_RIGHTS control messages (i.e. fd-passing), so that the invoked program will run as a child process of s6-sudod, with s6-sudod's effective user, but its standard input, output and error descriptors will be a copy of s6-sudoc's. The program's environment will be s6-sudod's environment, except that every variable that is defined but has an empty value will set to the value it has in s6-sudoc 's enviroment, if it is also set. s6-sudoc waits until s6-sudod's child process exits. If it is invoked with a -T option followed by a time value in milliseconds, it will close the conection and exit after the specified time has passed if s6-sudod's child is still running.

s6-sudo is a helper program that accepts options, a UNIX domain socket pathname and an s6-sudoc argument sequence, and invokes s6-ipcclient chained to s6-sudoc. The socket pathname is passed to s6-ipcclient, and the argument sequence, to s6-sudoc. s6-sudo options specify corresponding s6-ipcserver-socketbinder and s6-sudoc options. For the full description of s6-sudo's, s6-sudoc's and s6-sudod's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Standard permissions settings on s6-sudo's listening socket can be used to implement some access control, and credentials passing over a UNIX domain socket also allows finer-grained control. The s6-ipcserver-access program can be used to take advantage of credentials passing.

Important
If s6-sudoc is killed, or exits while s6-sudod's child process is still running, s6-sudod will send a SIGTERM followed by a SIGCONT signal to its child, and then exit 1. However, sending a SIGTERM to the child does not guarantee that it will die, and if it keeps running, it might still read from the file descriptor that was s6-sudoc's standard input, or write to the file descriptors that were s6-sudoc's standard output or error. This is a potential security risk. Administrators should audit their server programs to make sure this does not happen. More generally, anything using signals or terminals will not be handled transparently by the s6-sudoc + s6-sudod mechanism. The mechanism was designed to allow programs to gain privileges in specific situations: short-lived, simple, noninteractive processes. It was not designed to emulate the full suid functionality and will not go out of its way to do so. Also, s6-sudoc's argument sequence may be empty. In that case, the client is in complete control of the program executed as s6-sudod's child. This setup is permitted but very dangerous, and extreme attention should be paid to access control.
FILE test-scriptExample execline script to be executed by s6-sudod
#!/bin/execlineb -S0
pipeline { id -u } withstdinas -n localuser
importas localuser localuser
importas -D unavailable IPCREMOTEEUID IPCREMOTEEUID
importas -D unset VAR1 VAR1
importas -D unset VAR2 VAR2
importas -D unset VAR3 VAR3
foreground { echo Script run with effective user ID $localuser and arguments $@ }
echo IPCREMOTEEUID=$IPCREMOTEEUID VAR1=$VAR1 VAR2=$VAR2 VAR3=$VAR3

Testing the script by executing it directly:

user1 $VAR1="s6-sudoc value" VAR2="ignored variable" ./test-script arg1 arg2
Script run with effective user ID 1000 and arguments arg1 arg2
IPCREMOTEEUID=unavailable VAR1=s6-sudoc value VAR2=ignored variable VAR3=unset

The script is executed with effective user user1 (UID 1000), IPCREMOTEEUID and VAR3 are unset, and VAR1 and VAR2 are set to the specified values.

FILE s6-sudod-wrapperExample execline script to launch an s6-sudod process with access control
s6-ipcserver run-test-script
s6-ipcserver-access -v 2 -i rules
s6-sudod ./test-script arg1 arg2

s6-ipcserver-access's -v 2 argument increments its verbosity level. Contents of rules directory rules:

user1 $ls -l rules/*/*
rules/uid/1002:
total 4
-rw-r--r-- 1 user1 user1    0 Aug  4 12:00 allow
drwxr-xr-x 2 user1 user1 4096 Aug  4 12:00 env

rules/uid/default:
total 0
-rw-r--r-- 1 user1 user1 0 Aug  4 12:00 deny
user1 $ls -1 rules/uid/1002/env
VAR1
VAR3
FILE rules/uid/1002/env/VAR3
s6-sudod value

File rules/uid/1002/env/VAR1 contains an empty line, so the corresponding environment variable will be set, but empty. Launching the s6-sudod process:

user1 $execlineb -P s6-sudod-wrapper &
user1 $ls -l run-test-script
srwxrwxrwx 1 user1 user1 0 Aug  4 12:10 run-test-script

This shows that a UNIX domain socket named run-test-script was created in the working directory. Running s6-sudo with effective user user2 (UID 1001):

user2 $VAR1="s6-sudoc value" VAR2="ignored variable" s6-sudo run-test-script arg3 arg4
s6-ipcserver-access: info: deny pid 2125 uid 1001 gid 1001: Permission denied
s6-sudoc: fatal: connect to the s6-sudod server - check that you have appropriate permissions

s6-sudo run-test-script arg3 arg4 is equivalent to s6-ipcclient run-test-script s6-sudoc arg3 arg4, but shorter. This shows that the rules directory setup denied execution of test-script to user2 (UID 1001); it only allows it to the user with UID 1002. Modifying rules:

user1 $mv rules/uid/100{2,1}
user1 $ls -1 rules/*/*
rules/uid/1001:
allow
env

rules/uid/default:
deny

Retrying s6-sudo:

user2 $VAR1="s6-sudoc value" VAR2="ignored variable" s6-sudo run-test-script arg3 arg4
s6-ipcserver-access: info: allow pid 2148 uid 1001 gid 1001
Script run with effective user ID 1000 and arguments arg1 arg2 arg3 arg4
IPCREMOTEEUID=1001 VAR1=s6-sudoc value VAR2=unset VAR3=s6-sudod value

Comparing to the output of the script when run directly by user1, this shows that test-script's arguments are the concatenation of the ones supplied to s6-sudod in script s6-sudod-wrapper, arg1 and arg2, and the ones specified in the s6-sudo invocation, arg3 and arg4. Also, test-script's environment has s6-sudod's variables: IPCREMOTEEUID, inherited from s6-ipcserverd, and VAR3, inherited from s6-ipcserver-access, which in turn sets it based on environment directory rules/uid/1002/env. Because variable VAR1 is set by s6-ipcserver-access but empty, s6-sudod sets it to the value it has in s6-sudoc's environment. And because variable VAR2 is set in s6-sudoc's environment but not in s6-sudod's, it is also unset in test-script's environment.

The file descriptor holder and related tools

See here. A combination of these and other s6 tools allow the implementation of the mechanism that systemd calls socket activation, for services that want that.

s6-svscan as process 1

The s6-svscan program was also written to be robust enough and go out of its way to stay alive, even in dire situations, so that it is suitable for running as process 1 during most of a machine's uptime. However, the duties of process 1 vary widely during the machine's boot sequence, its normal, stable 'up and running' state, and its shutdown sequence, and in the first and third cases, they are heavily system-dependent, so it is not possible to use a program designed to be as portable as possible[8]. Because of that, auxiliary and system-dependent programs, named the stage1 init and the stage3 init, are used during the boot sequence and the shutdown sequence, respectively, to run as process 1, and s6-svscan is used the rest of the time. For details, see s6 and s6-rc-based init system.

To support its role as process 1, s6-svscan performs a reaper routine each time it receives a SIGCHLD signal, i.e. it uses a POSIX waitpid() call for each child process that becomes a zombie, both the ones it has spawned itself, and the ones that were reparented to process 1 by the kernel because its parent process died. An s6-svscanctl -z command naming its scan directory can be used to force s6-svscan to perform its reaper routine.

OpenRC's s6 integration feature

Starting with version 0.16, OpenRC can launch supervised long-lived processes using the s6 package as a helper [9]. This is an alternative to 'classic' unsupervised long-lived processes launched using the start-stop-daemon program. It should be noted that service scripts that don't contain start() and stop() functions implicitly use start-stop-daemon.

OpenRC services that want to use s6 supervision need both a service script in /etc/init.d and an s6 service directory. The service script must contain a supervisor=s6 variable assignment to turn the feature on, and must have a 'need' dependency on the s6-svscan service in its depend() function, to make sure the s6-svscan program is launched. It can contain neither a start() function, nor a stop() function (but their _pre() and _post() variants are OK), nor a status() function:

  • OpenRC internally invokes s6-svc with a -u option when the service script is called with a 'start' argument, and can also call s6-svwait after s6-svc to wait for an event, by assigning s6-svwait options to the s6_svwait_options_start variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d). For example, if the service supports readiness notification, s6_svwait_options_start="-U -t 5000" could be used to make OpenRC wait for the up and ready event with a 5 seconds timeout.
  • OpenRC internally invokes s6-svc with -d, -wD and -T options when the service script is called with a 'stop' argument, so it will wait for a really down event with a default timeout of 10 seconds. The timeout can be changed by assigning a time value in milliseconds to s6_service_timeout_stop variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d).
  • OpenRC internally invokes s6-svstat when the service script is called with a 'status' argument.

The s6 service directory can be placed anywhere in the filesystem, and have any name, as long as the service script (or the service-specific configuration file in /etc/conf.d) assigns the servicedir's absolute path to the s6_service_path variable. If s6_service_path is not assigned to, the s6 servicedir must have the same name as the OpenRC service script, and will be searched in /var/svc.d. The scan directory when using this feature is /run/openrc/s6-scan, and OpenRC will create a symlink to the service directory when the service is started.

Warning
OpenRC does not integrate as expected when s6-svscan is running as process 1, since OpenRC will launch another s6-svscan process with /run/openrc/s6-scan as its scan directory. So the result will be two independent supervision trees.

Example setup for a hypothetical supervised test-daemon process with a dedicated logger:

FILE /etc/init.d/test-serviceOpenRC service script
#!/sbin/openrc-run
description="A supervised test service with a logger"
supervisor=s6
s6_service_path=/home/user/test/svc-repo/test-service

depend() {
   need s6-svscan
}
FILE /etc/conf.d/test-serviceOpenRC service-specific configuration file
s6_svwait_options_start=-U
user $/sbin/rc-service test-service describe
* A supervised test service with a logger
* cgroup_cleanup: Kill all processes in the cgroup

The service directory:

user $ls -l /home/user/test/svc-repo/test-service /home/user/test/svc-repo/test-service/log
/home/user/test/svc-repo/test-service:
total 12
drwxr-xr-x 2 user user 4096 Aug  8 12:00 log
-rw-r--r-- 1 user user    2 Aug  8 12:00 notification-fd
-rwxr-xr-x 1 user user   86 Aug  8 12:00 run

/home/user/test/svc-repo/test-service/log:
total 4
-rwxr-xr-x 1 user user 65 Aug  8 12:00 run
FILE /home/user/test/svc-repo/test-service/run
#!/bin/execlineb -P
s6-softlimit -o 5
s6-setuidgid daemon
fdmove -c 2 1
/home/user/test/test-daemon --s6=5
FILE /home/user/test/svc-repo/test-service/notification-fd
5

This launches test-daemon with effective user daemon and the maximum number of open file descriptors set to 5. This is the same as if test-daemon performed a setrlimit(RLIMIT_NOFILE, &rl) call itself with rl.rlim_cur set to 5, provided that value does not exceed the corresponding hard limit. The program supports an --s6 option to turn readiness notification on, specifying the notification file descriptor (5), and also periodically prints to its standard error a message of the form 'Logged message #n', with an incrementing number n between 0 and 9. The redirection of test-daemon's standard error to standard output, using execline's fdmove program with the -c (copy) option, allows logging its messages using s6-log:

FILE /home/user/test/svc-repo/test-service/log/run
#!/bin/execlineb -P
s6-setuidgid user
s6-log t /home/user/test/logdir

An automatically rotated logging directory named logdir will be used, and messages will have a timestamp in external TAI64N format prepended to them.

Manually starting test-service:

root #time rc-service test-service start
* Creating s6 scan directory
* /run/openrc/s6-scan: creating directory
* Starting s6-svscan ...                    [ ok ]
* Starting test-service ...                 [ ok ]

real	0m11.681s
user	0m0.039s
sys	0m0.034s
root #rc-service test-service status
up (pid 2279) 33 seconds, ready 23 seconds

This shows that test-daemon took about 10 seconds to notify readiness to s6-supervise, and that the rc-service start command waited until the up and ready event, because of the s6-svwait -U option passed via s6_svwait_options_start in /etc/conf.d/test-service.

user $rc-status
Runlevel: default
...
Dynamic Runlevel: needed/wanted
...
s6-svscan                                   [  started  ]
...
Dynamic Runlevel: manual
test-service                                [  started  ]

The scan directory:

user $ls -la /run/openrc/s6-scan
total 0
drwxr-xr-x  3 root root  80 Aug  8 22:38 .
drwxrwxr-x 15 root root 360 Aug  8 22:38 ..
drwx------  2 root root  80 Aug  8 22:38 .s6-svscan
lrwxrwxrwx  1 root root  46 Aug  8 22:38 test-service -> /home/user/test/svc-repo/test-service

The supervision tree:

user $ps axf -o pid,ppid,pgrp,euser,args
 PID  PPID  PGRP EUSER    COMMAND
...
2517     1  2517 root     /bin/s6-svscan /run/openrc/s6-scan
2519  2517  2517 root      \_ s6-supervise test-service/log
2523  2519  2523 user      |   \_ s6-log t /home/user/test/logdir
2520  2517  2517 root      \_ s6-supervise test-service
2522  2520  2522 daemon        \_ /home/user/test/test-daemon --s6=5
...

Messages from the test-daemon process go to the logging directory:

user $ls -l /home/user/test/logdir
total 12
-rwxr--r-- 1 user user 352 Aug  8 22:39 @40000000598a67ec2d5d7180.s
-rwxr--r-- 1 user user 397 Aug  8 22:40 @40000000598a681919d6e581.s
-rwxr--r-- 1 user user 397 Aug  8 22:40 current
-rw-r--r-- 1 user user   0 Aug  8 22:38 lock
-rw-r--r-- 1 user user   0 Aug  8 22:38 state
user $cat /home/user/test/logdir/current | s6-tai64nlocal
2017-08-08 22:40:20.562745759 Logged message #1
2017-08-08 22:40:25.565816199 Logged message #2
2017-08-08 22:40:30.570600144 Logged message #3
2017-08-08 22:40:35.578765601 Logged message #4
2017-08-08 22:40:40.585146120 Logged message #5
2017-08-08 22:40:45.591282433 Logged message #6

Removal

Unmerge

root #emerge --ask --depclean --verbose sys-apps/s6

All scan directories, service directories, the /command symlink, etc. must be manually deleted if no longer wanted after removing the package. Also, all modifications to sysvinit's /etc/inittab must be manually reverted: lines for s6-svscanboot must be deleted, and a telinit q command must be used afterwards. And obviously, if s6-svscan is running as process 1, an alternative init system must be installed in parallel, and the machine rebooted to use it (possibly by reconfiguring the bootloader), before the package is removed, or otherwise the machine will become unbootable.

See also

External resources

References

  1. The execline language design and grammar. Retrieved on July 8th, 2017.
  2. Laurent Bercot, Why not just use /bin/sh?. Retrieved on July 8th, 2017.
  3. How to run s6-svscan under another init process. Retrieved on July 16th, 2017.
  4. Laurent Bercot, The logging chain, Retrieved on May 1st, 2017.
  5. Laurent Bercot, On the syslog design, Retrieved on May 1st, 2017.
  6. Notification vs. polling. Retrieved on July 28th, 2017.
  7. Service startup notifications. Retrieved on July 29th, 2017.
  8. How to run s6-svscan as process 1. Retrieved on August 20th, 2017.
  9. Using s6 with OpenRC. Retrieved on June 24th, 2017.