runit

From Gentoo Wiki
Jump to: navigation, search

Runit is a daemontools-inspired process supervision suite that also provides a program suitable for running as process 1. It can be used as alternative to sysvinit or systemd, either by itself or in conjunction with OpenRC. It can also be used as a helper for supervising OpenRC services.

Installation

USE flags

USE flags for sys-process/runit A UNIX init scheme with service supervision

static !!do not set this during bootstrap!! Causes binaries to be statically linked instead of dynamically global

Emerge

root #emerge --ask sys-process/runit
Note
>=sys-process/runit-2.1.2 is currently on the testing branch. The above command will install runit version 2.1.1 for systems on the stable branch; users who want a more recent version will need to add the package to /etc/portage/package.accept_keywords (if using Portage). While it is generally not advised to mix packages of stable and testing branches, this package only depends on the libc, so in this case it should be safe

Configuration

Environment variables

  • SVDIR - Directory sv will search for the services specified as arguments.
  • SVWAIT - Time sv will wait for a service to reach its desired state before timing out or killing it with a SIGKILL signal.

Files

  • /service - Directory sv will search for the services specified as arguments if SVDIR is empty of unset.
  • /etc/runit/1 - File runit will execute when the machine boots.
  • /etc/runit/2 - File runit will execute and supervise when /etc/runit/1 exits.
  • /etc/runit/3 - File runit will execute when the machine shuts down.
  • /etc/runit/ctrlaltdel - File runit will execute when receiving a SIGINT signal.
  • /etc/runit/stopit - Used by runit to decide whether it should initiate machine shutdown when receiving a SIGCONT signal or not.
  • /etc/runit/reboot - Used by runit to decide whether it should halt or reboot the machine.
  • /etc/runit/runsvdir/current - Symbolic link to runsvdir's scan directory when using <sys-process/runit-2.1.2.
  • /etc/runit/runsvdir/default - runsvdir's initial scan directory when using <sys-process/runit-2.1.2.
  • /etc/runit/runsvdir/all - Service directory repository when using <sys-process/runit-2.1.2.
  • /etc/service - runsvdir's scan directory when using >=sys-process/runit-2.1.2.
  • /etc/sv - Service directory repository when using >=sys-process/runit-2.1.2.
  • /run/openrc/sv - runsvdir's scan directory when using OpenRC's runit integration feature.
  • /var/service - Symbolic link to runsvdir's scan directory when using <sys-process/runit-2.1.2.

Service

OpenRC

See here.

Usage

Process supervision

For more in-depth information about the process supervision aspects of runit, see daemontools-encore. A summary follows.

runit program daemontools program with similar functionality
runsv supervise
runsvdir svscan plus readproctitle functionality
svlogd multilog
sv down svc -d
sv up svc -u
sv once svc -o
sv exit svc -dx
sv status svstat
chpst -e envdir
chpst -U envuidgid
chpst -P pgrphack
chpst -l setlock -N (setlock's default behaviour)
chpst -L setlock -n
chpst -u setuidgid
chpst -m softlimit -m
chpst -d softlimit -d
chpst -o softlimit -o
chpst -p softlimit -p
chpst -f softlimit -f
chpst -c softlimit -c
chpst -r softlimit -r


The program implementing the supervisor features in runit is runsv, and just like daemontools' supervise, it takes the (absolute or relative to the working directory) pathname of a service directory (or servicedir) as an argument. A runit service directory must contain at least an executable file named run, and can contain an optional, regular file named down, and an optional subdirectory or symbolic link to directory named log, all of which work like their daemontools counterparts. The service directory can also contain an optional, executable file named finish, that can be used to perfom cleanup actions each time the supervised process stops, possibly depending on its exit status information. runsv calls finish with two arguments: the first one is run's exit code, or -1 if run didn't exit normally, and the second one is the least significant byte of the exit status as determined by POSIX waitpid(). For instance, the second argument is 0 if run exited normally, and the signal number if run was terminated by a signal. If run or finish exit immediately, runsv waits 1 second before starting finish or restarting run, so that it does not loop too quickly. A supervised process will run in its runsv parent's session; making it a session leader requires using the chpst program with a -P option inside run. If runsv receives a SIGTERM signal, it behaves as if an sv exit command naming the corresponding service directory had been used (see later).

Just like daemontools' supervise, runsv keeps control files in a subdirectory of the servicedir, named supervise, and if it finds a simbolic link to directory with that name, runsv will follow it and use the linked-to directory for its control files. Unlike daemontools, runsv also keeps human-readable files in the supervise directory, named stat and pid, containing status information about the supervised process. For further information please consult the runsv man page.

The runsvdir program allows supervising a set of processes running in parallel using a scan directory (or scandir), just like daemontools' svscan, so it will be the supervision tree's root. It also checks at least every 5 seconds the time of last modification, the inode, or the device, of the scandir, and performs a scan if any of them has changed, launching runsv child processes for each new servicedir it finds, or old servicedir for which it finds its runsv process has exited, and sending a SIGTERM signal to all runsv children for which their corresponding servicedir is no longer present. Unlike daemontools' svscan, runsvdir accepts a second argument after the scan directory's pathname, that must be at least seven characters long, and works like daemontools readproctitle's last argument: it sets the number of characters of an automatically rotated log that runsvdir keeps in memory, and can be seen in the output of the ps utility. The first 5 characters will remain as specified in the argument, the rest will shift to the left as new messages are sent to runsvdir's standard error. runsvdir also writes a dot to the log every 15 minutes so that old messages expire. If a -P option is passed as an argument, runsvdir makes its runsv children leaders of new sessions using the POSIX setsid() call. For further information please consult the runsvdir man page.

svlogd is the logger program provided by the runit package. It supports automatically rotated logging directories (or logdirs) in the same way daemontool's multilog program does, but its user interface is quite different. Logging directory pathnames are supplied as arguments and don't need to start with a dot ('.') or slash ('/'). To prepend a timestamp in external TAI64N format to logged lines, svlogd must be invoked with a -t option. A -tt option prepends a UTC timestamp of the form YYYY-MM-DD_HH:MM:SS.xxxxx, and a -ttt option prepends a UTC timestamp of the form YYYY-MM-DDTHH:MM:SS.xxxxx. Other actions performed by svlogd on text lines read on its standard input can be specified in a file inside the logging directory, named config. Empty lines in config or lines that start with '#' are ignored, every other line specifies a single action. Actions are carried out sequentially in line order. Actions starting with s, n, !, + and - behave like their daemontools' multilog counterparts. Patterns in + and - actions have the same syntax as those from Bernstein daemontools' multilog, except that runit's svlogd also accepts a plus sign ('+') as a special character that matches the next character in the pattern one or more times, and that prepended timestamps are not considered for matching against the patterns. svlogd can be forced to perform a rotation if it receives a SIGALRM signal, and rereads the config files in the logdirs (after closing and reopening all logs) if it receives a SIGHUP signal. For the full description of svlogd's functionality please consult the respective man page.

chpst is a chain loading program that can be used to modify a supervised process' execution state. It accepts a set of options that specify what to do; some of them work like daemontools' envdir, envuidgid, pgrphack, setloc, setuidgid and softlimit, and others are runit-specific. For example, chpst -n increments or decrements the nice value of the process (using POSIX nice()), chpst -/ changes the root directory before executing the next program in the chain (using Linux chroot() on Gentoo), and chpst -b newname executes the next program in the chain as if it was invoked with the name newname (i.e. performs argv[0] substitution). This is useful for programs that have different behaviours depending on the name they are invoked with. If chpst itself is invoked with the names envdir, envuidgid, pgrphack, setloc, setuidgid or softlimit, it behaves as those daemontools programs. For the full description of chpst's functionality please consult the respective man page.

sv is runit's program for controlling supervised processes and querying status information about them. It accepts a subcommand and a set of service directory pathnames as arguments. Unless a pathname starts with a dot ('.') or slash ('/'), it is asumed to be relative to the directory specified as the value of the SVDIR environment variable, or to /service if SVDIR is empty or unset. The subcommand tells sv what to do. The up, down, once and exit subcommands behave like daemontools' svc -u svc -d svc -o and svc -dx commands, respectively. The status subcommand is similar to daemontools' svstat, it displays whether the supervised process is running ('run') or not ('down'), or if its finish file is currently running ('finish'), whether it is transitioning to the desired state or already there ('want up' or 'want down'), its process ID (PID) if it is up (or finish's PID if it is currently running), how long it has been in the current state, and whether its current up or down status matches the presence or absence of a down file in the servicedir ('normally up' or 'normally down'). If also shows if the supervised process is paused (because of a SIGSTOP signal) or has been sent a SIGTERM signal and runsv is waiting for its effect. Other sv subcommands allow reliably sending signals to the supervised process. In particular, sv alarm can be used to send a SIGALRM signal to a supervised svlogd process to force it to perform a rotation, and sv hup can be used to send it a SIGHUP signal to make it reread the logging directories' config files.

sv also accepts a set of subcommands resembling LSB init script actions [1]:

  • The sv start, sv stop and sv shutdown commands behave like sv up, sv down and sv exit, respectively, except that they wait for their actions to be completed, and then print the process' status as if an sv status command had been used. The wait period's duration is the value of the SVWAIT environment variable (in seconds), or 7 seconds if SVWAIT is empty or unset. It can also be specified in a -w option passed as argument to sv. The status line starts with 'ok:' if the supervised process reached the desired state during the wait period, and with 'timeout:' if it did not.
  • The sv force-stop and sv force-shutdown commands behave like sv stop and sv shutdown, respectively, except that if the supervised process didn't reach the desired state during the wait period, it will be sent a SIGKILL signal as if an sv kill command had been used. The status line will start with 'kill:' in that case.
  • The sv reload command behaves like sv hup (i.e. sends a SIGHUP signal to the supervised process), except that it prints the status line afterwards.
  • The sv try-restart command behaves like sv term followed by sv cont (i.e. sends a SIGTERM signal and then a SIGCONT signal), except that it waits for its actions to be completed, timing out after the wait period expires if they didn't, and prints the status line afterwards, just like sv start, sv stop and sv shutdown do.
  • The sv restart command behaves like sv term followed by sv cont followed by sv up, except that it waits for its actions to be completed, timing out after the wait period expires if they didn't, and prints the status line afterwards. That is, it behaves like sv try-restart, but with an extra sv up action.
  • The sv force-reload and sv force-restart commands behave like sv try-restart and sv restart, except that the supervised process will be sent a SIGKILL signal if it didn't reach the desired state during the wait period, just like sv force-stop and sv force-shutdown do.

sv's LSB init script action-like subcommands consider that the effect of their actions is complete based on the state runsv considers the supervised process to be in (as sv status would report it). This behaviour can be extended by including an executable file named check in the service directory. Subcommands that include in their actions the equivalent of an sv up, sv term or sv kill command will make sv execute the check file. The supervised process is considered to be up if runsv considers it to be up, and if check's exit code is 0. sv also supports a check subcommand, that performs no action on the supervised process, but makes sv execute the check file. If its exit code is 0, it will print a status line starting with 'ok:', otherwise, it will print a status line starting with 'timeout:' after the wait period expires.

For the full description of sv's functionality please consult the respective man page.

Example runit scan directory with down and finish files, as well as a symbolic link to a supervise directory elsewhere:

user $ls -l *
test-service1:
total 4
-rwxr-xr-x 1 user user 28 Jun  3 12:00 run
lrwxrwxrwx 1 user user 24 Jun  3 12:00 supervise -> ../../external-supervise

test-service2:
total 4
-rwxr-xr-x 1 user user 28 Jun  3 12:00 run

test-service3:
total 8
-rw-r--r-- 1 user user  0 Jun  3 12:00 down
-rwxr-xr-x 1 user user 63 Jun  3 12:00 finish
-rwxr-xr-x 1 user user 56 Jun  3 12:00 run

test-service4:
total 8
-rw-r--r-- 1 user user  0 Jun  3 12:00 down
-rwxr-xr-x 1 user user 54 Jun  3 12:00 finish
-rwxr-xr-x 1 user user 33 Jun  3 12:00 run
FILE test-service1/run
#!/bin/sh
exec test-daemon1
FILE test-service2/run
#!/bin/sh
exec test-daemon2
FILE test-service3/run
#!/bin/sh
echo Starting test-service3/run
exec sleep 10
FILE test-service3/finish
#!/bin/sh
echo Executing test-service3/finish $@
exec sleep 10
FILE test-service4/run
#!/bin/sh
exec test-daemon-ignoreterm
FILE test-service4/finish
#!/bin/sh
exec echo Executing test-service4/finish $@

It is assumed test-daemon-ignoreterm is a program that ignores the SIGTERM signal.

Resulting supervision tree when runsvdir is run on this scandir as a background process in an interactive shell, assuming it is a subdirectory named scan in the working directory (i.e. launched with runsvdir scan &):

user $ps xf -o pid,ppid,pgrp,euser,args
  PID  PPID  PGRP EUSER    COMMAND
 ...
 1776  1763  1776 user     -bash
 2471  1776  2471 user      \_ runsvdir scan
 2472  2471  2471 user          \_ runsv test-service4
 2473  2471  2471 user          \_ runsv test-service3
 2474  2471  2471 user          \_ runsv test-service1
 2476  2474  2471 user          |   \_ test-daemon1
 2475  2471  2471 user          \_ runsv test-service2
 2477  2475  2471 user              \_ test-daemon2
 ...
Important
Since processes in a supervision tree are created using the POSIX fork() call, all of them will inherit runsvdir's enviroment, which, in the context of this example, is the user's login shell environment. If runsvdir is launched in some other way (see later), the environment will likely be completely different. This must be taken into account when trying to debug a supervision tree with an interactive shell.

supervise subdirectory contents:

user $ls -l */supervise
lrwxrwxrwx 1 user user   24 Jun  3 12:00 test-service1/supervise -> ../../external-supervise

test-service2/supervise:
total 12
prw------- 1 user user  0 Jun  3 12:05 control
-rw------- 1 user user  0 Jun  3 12:05 lock
prw------- 1 user user  0 Jun  3 12:05 ok
-rw-r--r-- 1 user user  5 Jun  3 12:05 pid
-rw-r--r-- 1 user user  4 Jun  3 12:05 stat
-rw-r--r-- 1 user user 20 Jun  3 12:05 status

test-service3/supervise:
total 8
prw------- 1 user user  0 Jun  3 12:05 control
-rw------- 1 user user  0 Jun  3 12:05 lock
prw------- 1 user user  0 Jun  3 12:05 ok
-rw-r--r-- 1 user user  0 Jun  3 12:05 pid
-rw-r--r-- 1 user user  5 Jun  3 12:05 stat
-rw-r--r-- 1 user user 20 Jun  3 12:05 status

test-service4/supervise:
total 8
prw------- 1 user user  0 Jun  3 12:05 control
-rw------- 1 user user  0 Jun  3 12:05 lock
prw------- 1 user user  0 Jun  3 12:05 ok
-rw-r--r-- 1 user user  0 Jun  3 12:05 pid
-rw-r--r-- 1 user user  5 Jun  3 12:05 stat
-rw-r--r-- 1 user user 20 Jun  3 12:05 status
user $ls -l ../external-supervise
total 12
prw------- 1 user user  0 Jun  3 12:05 control
-rw------- 1 user user  0 Jun  3 12:05 lock
prw------- 1 user user  0 Jun  3 12:05 ok
-rw-r--r-- 1 user user  5 Jun  3 12:05 pid
-rw-r--r-- 1 user user  4 Jun  3 12:05 stat
-rw-r--r-- 1 user user 20 Jun  3 12:05 status

Messages sent by test-service3/run to runsvdir's standard output when manually started:

user $sv up ./scan/test-service3
Starting test-service3/run
Executing test-service3/finish 0 0
Starting test-service3/run
Executing test-service3/finish 0 0
Starting test-service3/run
...
user $sv status ./scan/*
run: ./scan/test-service1: (pid 2518) 80s
run: ./scan/test-service2: (pid 2519) 80s
run: ./scan/test-service3: (pid 2534) 7s, normally down
down: ./scan/test-service4: 80s

After enough seconds have elapsed:

user $sv status ./scan/*
run: ./scan/test-service1: (pid 2518) 86s
run: ./scan/test-service2: (pid 2519) 86s
finish: ./scan/test-service3: (pid 2537) 13s, normally down
down: ./scan/test-service4: 86s

Reliably sending a SIGSTOP signal to test-service3/run:

user $sv pause ./scan/test-service3
user $sv status ./scan/test-service3
run: ./scan/test-service3: (pid 2689) 100s, normally down, paused

Reliably sending a SIGTERM signal afterwards:

user $sv term ./scan/test-service3
user $sv status ./scan/test-service3
run: ./scan/test-service3: (pid 2689) 139s, normally down, paused, got TERM

The signal doesn't have any efect yet because the supervised process is stopped. To resume it a SIGCONT signal is needed:

user $sv cont ./scan/test-service3
Executing test-service3/finish -1 15
Starting test-service3/run
Executing test-service3/finish 0 0
Starting test-service3/run

Since the process is supervised, after being killed runsv executes test-service3/finish, and then restarts test-service3/run.

Messages sent by test-service3/run to runsvdir's standard output when manually stopped:

user $sv down ./scan/test-service3
Executing test-service3/finish -1 15

This shows that runsv stopped test-service3/run by killing it with a SIGTERM signal (signal 15).

Manually starting test-daemon-ignoreterm using sv's LSB-like interface:

user $sv start ./scan/test-service4
ok: run: ./scan/test-service4: (pid 2771) 1s, normally down

Manually stopping test-daemon2 and test-daemon-ignoreterm using sv's LSB-like interface:

user $sv stop ./scan/test-service2 ./scan/test-service4
ok: down: ./scan/test-service2: 1s, normally up
timeout: run: ./scan/test-service4: (pid 2771) 141s, normally down, want down, got TERM

This shows that test-daemon2 could be stopped ('ok:') but test-daemon-ignoreterm couldn't (because it ignores SIGTERM), so after the default 7 seconds wait period, sv gives up ('timeout:'). Forcibly stopping test-daemon-ignoreterm using sv's LSB-like interface:

user $sv force-stop ./scan/test-service4
kill: run: ./scan/test-service4: (pid 2771) 274s, normally down, want down, got TERM
Executing test-service4/finish -1 9

This shows that because test-daemon-ignoreterm didn't stop during the default 7 seconds wait, sv sends it a SIGKILL signal (signal 9), so it is now stopped:

user $sv status ./scan/test-service4
down: ./scan/test-service4: 21s

Starting the supervision tree

From OpenRC

As of version 0.22, OpenRC provides a service script that can launch runsvdir with readproctitle-style logging, also named runsvdir. On Gentoo, the scan directory will be /run/openrc/sv. This script exists to support the OpenRC-runit integration feature, but can be used to just launch a runit supervision tree when the machine boots by adding it to an OpenRC runlevel using rc-update:

root #rc-update add runsvdir default

Or it can also be started manually:

root #rc-service runsvdir start
Note
The service script launches runsvdir using OpenRC's start-stop-daemon program, so it will run unsupervised. Also, its standard input and output will be redirected to /dev/null. Its standard error will be redirected to the readproctitle-style log, though.

Because the service script calls runsvdir using absolute path /usr/bin/runsvdir, a symlink to the correct path must be created if using >=sys-process/runit-2.1.2:

root #ln -s /bin/runsvdir /usr/bin/runsvdir

And because /run is a tmpfs, and therefore volatile, servicedir symlinks must be created in the scan directory each time the machine boots, before runsvdir starts. The tmpfiles.d interface, which is supported by OpenRC using package opentmpfiles, can be used for this:

FILE /etc/tmpfiles.d/runsvdir.conf
#Type Path Mode UID GID Age Argument
d /run/openrc/sv
L /run/openrc/sv/service1 - - - - /path/to/servicedir1
L /run/openrc/sv/service2 - - - - /path/to/servicedir2
L /run/openrc/sv/service3 - - - - /path/to/servicedir3
...

Alternatively, OpenRC's local service could be used to start the supervision tree when entering OpenRC's 'default' runlevel, by placing '.start' and '.stop' files in /etc/local.d (please read /etc/local.d/README for more details) that perform actions similar to those of the runsvdir service script:

FILE /etc/local.d/runsvdir.start
#!/bin/sh
# Remember to add --user if you don't want to run as root
# Remember to change /usr/bin/runsvdir to /bin/runsvdir if using >=sys-process/runit-2.1.2
start-stop-daemon --start --background --make-pidfile \
   --pidfile /run/runsvdir.pid \
   --exec /usr/bin/runsvdir -- -P /path/to/scandir readproctitle-like-log-argument
FILE /etc/local.d/runsvdir.stop
#!/bin/sh
start-stop-daemon --stop --retry SIGHUP/5 --pidfile /run/runsvdir.pid

The SIGHUP signal makes runsvdir send a SIGTERM signal to all its runsv children before exiting, which, in turn, makes them stop their supervised processes and exit. The SIGTERM signal that start-stop-daemon sends by default would just make runsvdir exit.

From sysvinit

Following upstream's suggestion [2], Gentoo's packaging of runit provides a /sbin/runsvdir-start symbolic link to /etc/runit/2, that allows runsvdir to be launched and supervised by sysvinit by adding a 'respawn' line for it in /etc/inittab. Used in this way, the supervision tree becomes rooted in process 1, which cannot die without crashing the machine.

Gentoo users wanting to use runsvdir-start in this way will need to manually edit /etc/inittab, and then call telinit:

FILE /etc/inittab
SV:12345:respawn:/sbin/runsvdir-start
root #telinit q

This will make sysvinit launch and supervise runsvdir when entering runlevels 1 to 5.

The logging chain

A supervision tree where all leaf processes have a logger can be arranged into what the author of s6 calls the logging chain [3], which he considers to be technically superior to the traditional syslog-based centralized approach [4].

Since processes in a supervision tree are created using the POSIX fork() call, each of them will inherit runsvidir's standard input, output and error. A logging chain arrangement using runit is as follows:

  • Leaf processes should normally have a logger, so their standard output and error connect to their logger's standard input. Therefore, all their messages are collected and stored in dedicated, per-service logs by their logger. Some programs might need to be invoked with special options to make them send messages to their standard error, and redirection of standard error to standard output (i.e. 2>&1 in a shell script) must be performed in the servicedir's run file.
  • Leaf processes with a controlling terminal are an exception: their standard input, output and error connect to the terminal.
  • runsv, the loggers, and leaf processes that exceptionally don't have logger for some reason, inherit their standard input, output and error from runsvdir, so their messages are sent wherever the ones from runsvdir are.
  • Leaf processes that still unavoidably report their messages using syslog() have them collected and logged by a (possibly supervised) syslog server.

If runit is used as the init system, and runsvdir was invoked with no second argument, its standard input, output and error will be redirected to /dev/console. If runsvdir was invoked with a second argument, readproctitle-like logging is turned on and messages sent to runsvdir's standard error will go to the log and can be seen using ps.

Runit as the init system

Warning
While Gentoo does offer a runit package in its official repository, it is not completely supported as an init system. Gentoo users wanting to use runit as their machine's init system might need to use alternative ebuild repositories and/or do some local tweaking. See "External resources"

The runit package provides a program capable of running as process 1, also called runit, and a helper program, runit-init. If runit-init detects it is running as process 1, it replaces itself with runit using the POSIX execve() call. Therefore, to use runit as the system's init, a init=/sbin/runit-init parameter can be added to kernel's command line using the bootloader's available mechanisms (e.g. a linux command in some 'Gentoo with runit' menu entry for GRUB2). It is possible to go back to sysvinit + OpenRC at any time by reverting the change.

When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), runit executes the /etc/runit/1 file as a child process, in a foreground process group with /dev/console as the controlling terminal, and waits for it to finish. This file is usually a shell script, and is expected to perform all one time initialization tasks needed to bring the machine to its stable, normal 'up and running' state. Gentoo's /etc/runit/1 file is quite minimal, it only calls the openrc program to enter OpenRC's 'sysinit' runlevel, and then its 'boot' runlevel, emulating Gentoo's sysvinit /etc/inittab setup.

Note
This setup means that any long-lived processes launched by a service script upon entering OpenRC's 'sysinit' and 'boot' runlevels won't be supervised

When /etc/runit/1 exits, runit then executes the /etc/runit/2 file as a child process, makes it a session leader with the POSIX setsid() call, and supervises it: if /etc/runit/2 is killed by a signal or its exit code is 111, then runit will restart it, after sending a SIGKILL signal to every remaining process in its process group. Gentoo's /etc/runit/2 file is upstream's suggested one with minimal modifications. It is a shell script that uses the exec builtin utility to replace itself with runsvdir, so this creates a supervision tree rooted in process 1. The scan directory will be /var/service for <sys-process/runit-2.1.2, and /etc/service for >=sys-process/runit-2.1.2. The enviroment will be empty, except for the PATH variable, set to a known value in the script. runsvdir will use readproctitle-like logging, and, for >=sys-process/runit-2.1.2, is also passed the -P option.

Gentoo's packaging of runit expects /etc/runit/runsvdir/all for <sys-process/runit-2.1.2, or /etc/sv for >=sys-process/runit-2.1.2, to be a repository of service directories. Services that need to be started when the machine boots require a symbolic link in the scan directory to the corresponding servicedir in that repository. Gentoo only provides service directories for 6 parallel supervised agetty processes (with their symlinks in the scan directory); this allow users to get to a text console login, like with Gentoo's sysvinit /etc/inittab setup. Service directories for anything else must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

Runit doesn't directly support any runlevel-like concept, but if the machine contains a set of directories, each one with a scan directory structure, then it is possible to have a behaviour similar to 'changing runlevels' if the scan directory argument of runsvdir is actually a symbolic link. The software package's author proposes[5] creating a symbolic link to directory pointing to one of the aforementioned directories, which then becomes the current scan directory. Runit provides a runsvchdir program that can atomically modify this symlink, thereby changing the current scan directory, and runsvdir's next periodic scan would take care of starting and killing the appropriate runsv processes. For further details on runsvchdir, please consult the respective man page. Gentoo's packaging of runit version 2.1.1 supports this model: runsvdir's scan directory argument is symbolic link /var/service, that points to /etc/runit/runsvdir/current. The latter in turn is also a symlink that points to /etc/runit/runsvdir/default, but can be modified later using runsvchdir. Gentoo's packaging of more recent versions of runit does away with this runlevel-like setup.

If runit receives a SIGCONT signal, and the file /etc/runit/stopit exists and has the execute by owner permission set, it will kill /etc/runit/2 (first by sending it a SIGTERM signal and waiting, then by sending it a SIGKILL signal) and then execute the /etc/runit/3 file. This file is usually a shell script, and is expected to perform all tasks needed to shut the machine down. If /etc/runit/1 is killed by a signal or its exit code is 100, runit skips execution of /etc/runit/2 and executes /etc/runit/3. runit will also execute /etc/runit/3 if /etc/runit/2 exits (with an exit code other than 111). If /etc/runit/3 exits, runit will send a SIGKILL signal to all remaining processes, and then check the file /etc/runit/reboot to decide what to do next. If the file exists and has the execute by owner permission set, it reboots the machine. In any other case, it will power off the machine, or halt it if it can't power it off.

Gentoo's /etc/runit/3 file performs an sv shutdown for <sys-process/runit-2.1.2, or an sv force-shutdown for >=sys-process/runit-2.1.2, on every servicedir of runsvdir's scan directory, and then calls the openrc program to enter OpenRC's 'shutdown' or 'reboot' runlevels, depending on whether a poweroff or reboot operation was requested to runit via /etc/runit/reboot.

If runit receives a SIGINT signal (which is usually configured to happen when key combination Ctrl+Alt+Del is pressed), and the file /etc/runit/ctrlaltdel exists and has the execute by owner permission set, it will execute it as a child process, and when it exits, behave as if it had received a SIGCONT signal. Gentoo's /etc/runit/ctrlaltdel prints a "System is going down in 14 seconds..." message using the wall utility, makes sure file /etc/runit/stopit exists and has the execute by owner permission set, waits 14 seconds and then exits. The result being that SIGINT will either halt or reboot the machine after 14 seconds, depending on /etc/runit/reboot.

All runit's children run with their standard input, output and error initially redirected to /dev/console.

Reboot and shutdown

The runit-init program can be used to shut the machine down. Unless it is running as process 1, it accepts one argument, which can be either 0 or 6:

  • If it is 0, runit-init will create the /etc/runit/stopit and /etc/runit/reboot files if any of them does not exist, set the execute by owner permission for the former, unset it for the latter, and send a SIGCONT signal to process 1.
  • If it is 6, runit-init will create the /etc/runit/stopit and /etc/runit/reboot files if any of them does not exist, set the execute by owner permission for both of them, and send a SIGCONT signal to process 1.

Therefore, if process 1 is runit, then runit-init 0 will power off the machine, and runit-init 6 will reboot it.

This means that runit is not directly compatible with sysvinit's telinit, halt, poweroff, reboot, and shutdown commands. However, many programs (e.g. desktop environments) expect to be able to call programs with those names during operation, so if such thing is needed, it is possible to use compatibility shell scripts:

FILE $PATH/shutdown
#!/bin/sh
runit-init 0
FILE $PATH/reboot
#!/bin/sh
runit-init 6

Runit and service management

Runit doesn't have service manager features, i.e. it does not provide mechanisms for specifiying dependencies, service ordering constraits, etc. like OpenRC does using depend() functions in service scripts. If such things are needed, the desired behaviour must be explicitly enforced in the code of run files; the software package's author provides some tips on how to do that [6]. Sometimes, just doing nothing might be enough: if run simply exits with an error status when there is an unmet required condition, and, perhaps with help from a finish files that analyzes the exit code, the state the machine was in before run was executed is restored, the supervisor would just keep restarting the service until, after some convergence period, all its required conditions are met. The author of nosh calls this "the thundering herd solution" [7].

Nevertheless, OpenRC and runit do not interfere with each other, so it is possible to use OpenRC-managed services on a machine where the init system is runit. In particular, once the supervision tree rooted in process 1 is launched, it is still possible to manually start individual OpenRC services using rc-service, or even entering OpenRC's 'default' runlevel manually:

root #openrc default

Services from OpenRC's 'default' runlevel could be started automatically on boot using the existing local service, moving it to the 'boot' runlevel:

FILE /etc/local.d/rc-default.startEnter OpenRC's 'default' runlevel
#!/bin/sh
openrc default
root #rc-update del local default
root #rc-update add local boot

Alternatively, /etc/runit/1 can be modified to add the corresponding openrc invocation:

FILE /etc/runit/1
RUNLEVEL=S /sbin/openrc default

Note however that OpenRC services will not be supervised by runit.

Runit can be used without OpenRC's service management, but this requires alternative implementation of the functionality of its service scripts, especially those executed upon entering the 'sysinit', 'boot' and 'shutdown' runlevels, and replacing the Gentoo-provided /etc/runit/1 and /etc/runit/3 files with custom ones, since they call the openrc program. It can be be useful to study those from runit-based distributions (e.g. see Void Linux's ones in their void-runit package sources).

OpenRC's runit integration feature

Starting with version 0.22, OpenRC can launch supervised long-lived processes using the runit package as a helper [8]. This is an alternative to 'classic' unsupervised long-lived processes launched using the start-stop-daemon program. It should be noted that service scripts that don't contain start() and stop() functions implicitly use start-stop-daemon.

OpenRC services that want to use runit supervision need both a service script in /etc/init.d and a runit service directory. The service script must contain a supervisor=runit variable assignment to turn the feature on, and must have a 'need' dependency on the runsvdir service in its depend() function, to make sure the runsvdir program is launched (see here). It can contain neither a start() function, nor a stop() function (but their _pre() and _post() variants are OK), nor a status() function; OpenRC internally invokes sv when the service script is called with a 'start', 'stop' or 'status' argument.

The runit service directory can be placed anywhere in the filesystem, and have any name, as long as the service script (or the service-specific configuration file in /etc/conf.d) assigns the servicedir's absolute path to the runit_service variable. If runit_service is not assigned to, the runit servicedir must have the same name as the OpenRC service script, and will be searched in the >=sys-process/runit-2.1.2 service directory repository, /etc/sv. The scan directory when using this feature is /run/openrc/sv, and OpenRC will create a symlink to the service directory when the service is started, and delete it when the service is stopped.

Warning
OpenRC does not integrate as expected when runit is used as the init system, since there will be two runsvdir processes: the one supervised by runit with /etc/service or /var/service as the scan directory, and the unsupervised one launched by OpenRC with /run/openrc/sv as the scan directory. So the result will be two independent supervision trees.

Example setup for a hypothetical supervised test-daemon service, with and without a dedicated logger.

FILE /etc/init.d/test-serviceOpenRC service script
#!/sbin/openrc-run
description="A supervised test service"
supervisor=runit
runit_service=/home/user/test/svc-repo/test-service

depend() {
   need localmount runsvdir
}
user $/sbin/rc-service test-service describe
* A supervised test service
* cgroup_cleanup: Kill all processes in the cgroup
FILE /etc/init.d/test-service-loggedOpenRC service script
#!/sbin/openrc-run
description="A supervised test service with a logger"
supervisor=runit
runit_service=/home/user/test/svc-repo/test-service-logged

depend() {
   need localmount runsvdir
}
user $/sbin/rc-service test-service-logged describe
* A supervised test service with a logger
* cgroup_cleanup: Kill all processes in the cgroup

The service directories:

user $ls -l /home/user/test/svc-repo/test-service* /home/user/test/svc-repo/test-service*/log
test-service:
total 4
-rwxr-xr-x 1 user user 96 Jun 17 12:00 run

test-service-logged:
total 8
drwxr-xr-x 2 user user 4096 Jun 17 12:00 log
-rwxr-xr-x 1 user user  101 Jun 17 12:00 run

test-service-logged/log:
total 4
-rwxr-xr-x 1 user user 62 Jun 17 12:00 run
FILE test-service/run
#!/bin/sh
exec \
chpst -o 5 \
chpst -u daemon \
/home/user/test/test-daemon

This launches program test-daemon with effective user daemon and the maximum number of open file descriptors set to 5. This is the same as if test-daemon performed a setrlimit(RLIMIT_NOFILE, &rl) call itself with rl.rlim_cur set to 5, provided that value does not exceed the corresponding hard limit. The program also periodically sends a message of the form "Logged message #N" to its standard error.

FILE test-service-logged/run
#!/bin/sh
exec \
chpst -o 5 \
chpst -u daemon \
/home/user/test/test-daemon 2>&1
FILE test-service-logged/log/run
#!/bin/sh
exec \
chpst -u user \
svlogd -tt /home/user/test/logdir

The redirection of test-daemon's standard error to standard output allows logging its messages using runit's svlogd. An automatically rotated logging directory named logdir will be used, and messages will have a UTC timestamp prepended to them.

Manually starting test-service-logged:

root #rc-service test-service-logged start
* /run/openrc/sv: creating directory
* Starting runsvdir ...                       [ ok ]
* Starting test-service-logged ...
* Failed to start test-service-logged         [ !! ]
* ERROR: test-service-logged failed to start
Warning
There's currently a bug in the implementation of service startup; OpenRC calls sv start immediately after creating the servicedir symlink in the scan directory, instead of waiting for runsvdir's next periodic scan. Because no runsv process has been launched yet, sv start will fail. However, if there is no down file in the service directory, after the next scan the service will go up regardless, when the corresponding runsv process is launched.
root #rc-service test-service-logged status
run: /run/openrc/sv/test-service-logged: (pid 1959) 24s; run: log: (pid 1958) 24s

Make OpenRC's notion of the service's state catch up:

root #rc-service test-service-logged start
* Starting test-service-logged ...            [ ok ]

The resulting supervision tree so far:

user $ps axf -o pid,ppid,pgrp,euser,args
  PID  PPID  PGRP EUSER    COMMAND
 ...
 1931     1  1931 root     /usr/bin/runsvdir -P /run/openrc/sv log: ...................................................
 2153  1931  2153 root      \_ runsv test-service-logged
 2154  2153  2153 user          \_ svlogd -tt /home/user/test/logdir
 2155  2153  2153 daemon        \_ /home/user/test/test-daemon
 ...

Messages from the test-daemon process with PID 2155 go to the logging directory:

user $ls -l /home/user/test/logdir
total 12
-rwxr--r-- 1 user user 441 Jun 17 12:19 @4000000059454877288a41fc.s
-rwxr--r-- 1 user user 264 Jun 17 12:19 @400000005945489513993834.s
-rw-r--r-- 1 user user 706 Jun 17 12:20 current
-rw------- 1 user user   0 Jun 17 12:04 lock
user $cat /home/user/test/logdir/current
2017-06-17_12:19:42.20404 Logged message #1
2017-06-17_12:19:47.20759 Logged message #2
2017-06-17_12:19:52.21598 Logged message #3
2017-06-17_12:19:57.21806 Logged message #4
2017-06-17_12:20:02.22180 Logged message #5
2017-06-17_12:20:07.22399 Logged message #6

Manually starting test-service:

root #rc-service test-service start
* Starting test-service ...
* Failed to start test-service                [ !! ]
* ERROR: test-service failed to start

Make OpenRC's notion of the service's state catch up because of the service startup bug:

root #rc-service test-service start
* Starting test-service ...                   [ ok ]
user $rc-status
Runlevel: default
...
Dynamic Runlevel: needed/wanted
runsvdir                                      [  started  ]
...
Dynamic Runlevel: manual
test-service-logged                           [  started  ]
test-service                                  [  started  ]

The scan directory:

user $ls -l /run/openrc/sv
total 0
lrwxrwxrwx 1 root root 46 Jun 17 12:25 test-service -> /home/user/test/svc-repo/test-service
lrwxrwxrwx 1 root root 53 Jun 17 12:12 test-service-logged -> /home/user/test/svc-repo/test-service-logged

Final supervision tree:

user $ps axf -o pid,ppid,pgrp,euser,args
  PID  PPID  PGRP EUSER    COMMAND
 ...
 1931     1  1931 root     /usr/bin/runsvdir -P /run/openrc/sv log: ged message #8 Logged message #9 Logged message #10
 2153  1931  2153 root      \_ runsv test-service-logged
 2154  2153  2153 user      |   \_ svlogd -tt /home/user/test/logdir
 2155  2153  2153 daemon    |   \_ /home/user/test/test-daemon
 2249  1931  2249 root      \_ runsv test-service
 2250  2249  2249 daemon        \_ /home/user/test/test-daemon
 ...

Since the test-daemon process with PID 2250 doesn't have a dedicated logger, its messages go to runsvdir's standard error, are logged readproctitle-style, and show up in ps' output (for process 1931 in this case).

Removal

Unmerge

root #emerge --ask --depclean sys-process/runit

Service directories, additional scan directories, the /usr/bin/runsvdir symlink to /bin/runsvdir, etc. must be manually deleted if no longer wanted after removing the package. Also, all modifications to sysvinit's /etc/inittab must be manually reverted: lines for runsvdir-start must be deleted, and a telinit q command must be used afterwards. And obviously, if runit is being used as the init system, an alternative one must be installed in parallel, and the machine rebooted to use it (possibly by reconfiguring the bootloader), before the package is removed, or otherwise the machine will become unbootable.

See also

External resources

References

  1. Linux Standard Base Core Specification 5.0.0, Generic Part, Chapter 22, "System Initialization", 22.2, "Init Script Actions". Retrieved on June 4th, 2017.
  2. Using runit with sysvinit and inittab. Retrieved on May 28th, 2017.
  3. Laurent Bercot, the logging chain, Retrieved on May 1st, 2017.
  4. Laurent Bercot, on the syslog design, Retrieved on May 1st, 2017.
  5. runit - runlevels. Retrieved on June 10th, 2017.
  6. runit - service dependencies. Retrieved on June 10th, 2017.
  7. Jonathan de Boyne Pollard, the nosh Guide, Introduction, section "Differentiation from other systems", bullet "No daemontools-style thundering herds". Retrieved on June 10th, 2017.
  8. Using runit with OpenRC. Retrieved on June 15th, 2017.