OpenRC/supervise-daemon

From Gentoo Wiki
< OpenRC
Jump to:navigation Jump to:search
supervise-daemon

Supervise-daemon is OpenRC's daemon supervisor. It can start, stop and restart system processes when they terminate unexpectedly.

Introduction

OpenRC traditionally uses start-stop-daemon, often abbreviated to s-s-d for starting and stopping programs. When s-s-d starts a process it saves the process' PID somewhere on permanent storage (typically under /run/), and backgrounds (daemonizes) the process it started. When the time comes to stop, kill, or signal the daemon process s-s-d will use the saved PID file to find the right process.

Supervising on the other hand usually keeps the started daemon as a child process of the supervisor. Backgrounding the daemon is therefore not needed, and not desired. Pidfiles are still used however. They will store the PID of the supervisor process so that it can receive start/stop signals.

Supervising daemon process has a few major advantages:

  1. If and when a daemon process dies (crashes) then its supervisor will notice it and try to restart it.
  2. Any terminal output sent by the daemon process to stdout and stderr can be caught by the supervisor, and then sent to the system logger or to a file.

This article aims to provide help how to bring services under its supervision.

General recipe

Currently, standard Gentoo init files do not use process supervision with supervise-daemon yet: it is left to users to make these modifications to the init files.

The general recipe to bring a service under the control of supervise-daemon is to adapt its init file in /etc/init.d as follows:

  • always add supervisor="supervise-daemon"
  • remove any pidfile reference that the command would create. supervise-daemon needs a pidfile for itself. e.g. :
    • change command_args="-p ${pidfile} ${NTPD_OPTS}" into command_args="${NTPD_OPTS}" command_args_background="-p ${pidfile}"
  • make sure the daemon will run in foreground by applying one of the following methods:
    • if the daemon forks itself to the background (daemonizes itself) then pass the appropriate daemon command line option, e.g. command_args_foreground="--foreground". Note that the command line option to prevent daemonizing is different per service, and some services might not even provide this option.
    • if there is a command line parameter for the daemon process to make it daemonize itself, then move it to command_args_background
  • replace calls to start-stop-daemonand if needed with calls to ${supervisor} as appropriate.

Instructions per service

This chapter contains some examples on how to bring a service' daemon process under supervision.

acpid

Reviewing the man page of acpid reveals:

  • .. will run as background process ..
  • .. -f, --foreground .. keeps acpid in the foreground by not forking at startup, and makes it log to stderr instead of syslog.

Edit /etc/init.d/acpid as follows to make acpid run under supervise-daemon:

  • add the supervisor definition
  • add the arguments to make acpid run in foreground
  • rewrite the s-s-d command to a supervisor-daemon command
  • Add a pidfile for supervise-daemon to track the service

At the top of the file:

FILE /etc/init.d/acpid
supervisor="supervise-daemon"
command_args_foreground="--foreground"
pidfile="/run/acpid.pid"

At the end of the file:

FILE /etc/init.d/acpid
reload() {
        ebegin "Reloading acpid configuration"
#       start-stop-daemon --exec $command --signal HUP
        ${supervisor} ${RC_SVCNAME} --signal HUP --pidfile "${pidfile}"
        eend $?
}

Start up the service:

root #rc-service acpid start
acpid                  | * Starting acpid ...                           [ ok ]

Verify if acpid is now running under supervise-daemon:

root #ps -ef | grep acpid
root      7450     1  0 15:32 ?        00:00:00 supervise-daemon acpid --start /usr/sbin/acpid -- --foreground
root      7454  7450  0 15:32 ?        00:00:01 /usr/sbin/acpid --foreground

Check the logs as well:

root #tail /var/log/messages
Jun 10 09:10:27 [supervise-daemon] Supervisor command line: supervise-daemon acpid --start /usr/sbin/acpid -- --foreground
Jun 10 09:10:27 [supervise-daemon] Child command line: /usr/sbin/acpid --foreground

And when you create an acpid event, it will be logged:

root #tail /var/log/messages
Jun 10 09:15:08 [user] ACPI event unhandled: button/mute MUTE 00000080 00000000 K

Lastly, check if supervise-daemon will restart acpid when it terminates:

root #kill 7454
root #tail /var/log/messages
Jun 10 09:54:20 [supervise-daemon] /usr/sbin/acpid, pid 7454, exited with return code 0
Jun 10 09:54:20 [supervise-daemon] Child command line: /usr/sbin/acpid --foreground 
root #ps -ef | grep acpid
root      7450     1  0 15:32 ?        00:00:00 supervise-daemon acpid --start /usr/sbin/acpid -- --foreground
root      8931  7450  0 15:32 ?        00:00:01 /usr/sbin/acpid --foreground

Notice the different PID.

avahi-daemon

net-dns/avahi contains a service avahi-daemon for service discovery. Its init file /etc/init.d/avahi-daemon does not make use of s-s-d, but calls the binary /usr/sbin/avahi-daemon directly and gives it the instruction to daemonize: /usr/sbin/avahi-daemon -D on startup. This can be simply adjusted by defining the command variable as command="/usr/sbin/avahi-daemon", and removing the start and stop functions from the file. Of course it is also needed to specify the supervisor.

Edit /etc/init.d/avahi-daemon as follows:

FILE /etc/init.d/avahi-daemon
#!/sbin/openrc-run
# Copyright 1999-2016 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2

extra_started_commands="reload"
command="/usr/sbin/avahi-daemon"
supervisor="supervise-daemon"

depend() {
        before netmount nfsmount
        use net
        need dbus
}

#start() {
#       ebegin "Starting avahi-daemon"
#       /usr/sbin/avahi-daemon -D
#       eend $?
#}
#
#stop() {
#       ebegin "Stopping avahi-daemon"
#       /usr/sbin/avahi-daemon -k
#       eend $?
#}

reload() {
       ebegin "Reloading avahi-daemon"
#      /usr/sbin/avahi-daemon -r
       ${command} -r
       eend $?
}

Start the service back up:

root #rc-service avahi-daemon start
avahi-daemon           | * Caching service dependencies ...                              [ ok ]
avahi-daemon           | * Starting avahi-daemon ...                                     [ ok ]

Verify that the service is now supervised:

root #ps -ef | grep avahi
root     27390     1  0 10:30 ?        00:00:00 supervise-daemon avahi-daemon --start /usr/sbin/avahi-daemon --
avahi    27392 27390  0 10:30 ?        00:00:00 avahi-daemon: running [e485.local]
avahi    27397 27392  0 10:30 ?        00:00:00 avahi-daemon: chroot helper

bluetoothd

net-wireless/bluez provides Bluetooth services. The recipe to bring it under supervise-daemon is as follows:

The existing service is written to easily switch supervisors. Define the supervisor in conf.d

Edit /etc/conf.d/bluetooth as follows:

FILE /etc/conf.d/bluetooth
supervisor="supervise-daemon"

cupsd

CUPS is the well known printing system of Linux. Its daemon cupsd is provided by net-print/cups.

The existing service is written to easily switch supervisors. Define the supervisor in conf.d:

FILE /etc/conf.d/cupsd
supervisor="supervise-daemon"

cups-browsed

Cups-browsed is a daemon for browsing the Bonjour broadcasts of shared, remote CUPS printers.

The existing service is written to easily switch supervisors. Define the supervisor in conf.d:

FILE /etc/conf.d/cups-browsed
supervisor="supervise-daemon"

dbus-daemon

D-Bus is a message bus system, a simple way for applications to talk to one another. It can be brought under supervise-daemon like any other service.

Warning
When D-BUS is stopped the system or desktop may become unstable. Best to restart the system after editing the init file.

Edit /etc/conf.d/dbus as follows:

FILE /etc/conf.d/dbus
supervisor="supervise-daemon"
command_args_foreground="--nofork --nopidfile"

dhcpcd

DHCPCD, the Dynamic Host Configuration Protocol Client Daemon is a popular DHCP client capable of handling both IPv4 and IPv6 configurations.

Take the following steps to have it run under supervise daemon:

  • pass the dhcpcd process option --nobackground to prevent it from backgrounding.

Edit /etc/conf.d/dhcpcd as follows:

FILE /etc/conf.d/dhcpcd
supervisor="supervise-daemon"
command_args_foreground="--nobackground"

dnsmasq

Dnsmasq, provided by package net-dns/dnsmasq, is a lightweight DHCP and caching DNS server. Dnsmasq's init file contains references to s-s-d. The daemon needs to be configured to run in foreground.

Edit /etc/init.d/dnsmasq as follows:

FILE /etc/init.d/dnsmasq
pidfile="/var/run/dnsmasq.pid"
command="/usr/sbin/dnsmasq"
command_args_background="-x ${pidfile}"
command_args="${DNSMASQ_OPTS}"
command_args_foreground="--keep-in-foreground"
supervisor="supervise-daemon"
retry="TERM/3/TERM/5"

depend() {
        provide dns
        need localmount net
        after bootmisc
        use logger
}

start_pre() {
        checkpath --owner dnsmasq:dnsmasq \
                --mode 0644 \
                --file /var/lib/misc/dnsmasq.leases
}

reload() {
        ebegin "Reloading ${RC_SVCNAME}"
#       start-stop-daemon --signal HUP --pidfile "${pidfile}"
        ${supervisor} ${RC_SVCNAME} --signal HUP --pidfile "${pidfile}"
        eend $?
}

rotate() {
        ebegin "Reopening ${RC_SVCNAME} log file"
#       start-stop-daemon --signal USR2 --pidfile "${pidfile}"
        ${supervisor} ${RC_SVCNAME} --signal USR2 --pidfile "${pidfile}"
        eend $?
}

fancontrol

Fancontrol is a bash script provided by sys-apps/lm-sensors to control the fan speed. It can be run as a service under supervision by editing /etc/conf.d/fancontrol:

The existing service is written to easily switch supervisors. Define the supervisor in conf.d:

FILE /etc/conf.d/fancontrol
supervisor="supervise-daemon"


fcron

sys-process/fcron is a cron daemon implementation. To get it supervised it is needed to:

  • define the supervisor
  • tell the supervisor how to get the daemon to run in foreground
  • change the reload function to use supervise-daemon instead of s-s-d.

Edit /etc/init.d/fcron as follows:

FILE /etc/init.d/fcron
command="/usr/libexec/fcron"
command_args="-c \"${FCRON_CONFIGFILE}\" ${FCRON_OPTS}"
start_stop_daemon_args=${FCRON_SSDARGS:-"--wait 1000"}
pidfile="$(getconfig pidfile /run/fcron.pid)"
fcrontabs="$(getconfig fcrontabs /var/spool/fcron)"
fifofile="$(getconfig fifofile /run/fcron.fifo)"
required_files="${FCRON_CONFIGFILE}"
supervisor="supervise-daemon"
command_args_foreground="--foreground"

extra_started_commands="reload"

reload() {
#        start-stop-daemon --signal HUP --pidfile "${pidfile}"
         ${supervisor} "${SVCNAME}" --signal HUP --pidfile "${pidfile}"
}

gerbera

net-misc/gerbera provides provides a UPnP MediaServer.

The existing service is written to easily switch supervisors. Define the supervisor in conf.d:

FILE /etc/conf.d/gerbera
supervisor="supervise-daemon"


iwd

iwd (iNet wireless daemon) is provided by net-wireless/iwd and aims to replace net-wireless/wpa_supplicant.

The existing service is written to easily switch supervisors. Define the supervisor in conf.d:

FILE /etc/conf.d/iwd
supervisor="supervise-daemon"

metalog

app-admin/metalog comes with good init scripts that only require some basic modifications to become supervised, however there are a few quirks:

First, metalog will create its pidfile at a compiled-in default location even if the --pidfile option is not specified. If the pidfile variable happens to be set to this same path, then metalog will overwrite it with the wrong pid, and OpenRC won't be able to stop the daemon. This can be fixed by specifying a pidfile at some other path that won't do any harm e.g. /run/metametalog.pid.

Note: /dev/null should not be used as the pidfile because metalog will delete and re-create the file if it exists. As an alternative, the default pidfile could be left alone, and the pidfile location passed to supervise-daemon could be changed.

Second, under certain conditions it is possible for the metalog process to inherit the current tty as its stdout/stderr. Since metalog is programmed to always write the log messages to stdout/stderr in addition to the configured log files, this can result in a user's terminal filling up with system log messages. This can be fixed by telling supervise-daemon to redirect stdout and stderr.

FILE /etc/init.d/metalog
# Enable supervise-daemon
supervisor="supervise-daemon"

# Redirect stdout and stderr so log messages don't get routed to a tty
supervise_daemon_args="--stdout /dev/null --stderr /dev/null"

# Move daemonize and pidfile options out of the 'command_args' variable
command_args="${METALOG_OPTS}"

# Set pidfile depending on whether the daemon runs in foreground or background
command_args_foreground="--pidfile /run/metametalog.pid"
command_args_background="--daemonize --pidfile ${pidfile}"

Finally, ensure all calls to start-stop-daemon are replaced with calls to ${supervisor}

FILE /etc/init.d/metalog
${supervisor} "${SVCNAME}" --signal x --pidfile "${pidfile}"

mpd

Package media-sound/mpd provides a music player daemon. It can be brought under supervision by:

  • providing the foreground parameter --no-daemon
  • declaring the supervisor
  • updating the reload function to use the supervior instead of the PID.

Edit /etc/init.d/mpd as follows:

FILE /etc/init.d/mpd
extra_started_commands='reload'
command=/usr/bin/mpd
command_args=${CFGFILE}
required_files=${CFGFILE}
pidfile=$(get_config pid_file)
description="Music Player Daemon"
command_args_foreground="--no-daemon"
supervisor="supervise-daemon"

reload() {
      ebegin "Reloading ${RC_SVCNAME}"
#      start-stop-daemon --pidfile ${pidfile} --signal HUP
      ${supervisor} ${SVCNAME} --signal HUP --pidfile "${pidfile}"
      eend $?
  }

ntpd

The network time protocol daemon ntpd, from package net-misc/ntp. can be brought under supervision by editing the init file to:

  • prevent forking to background by adding "-n" to the command line to of the ntpd daemon,
  • remove the s-s-d instruction,
  • define the supervisor.
FILE /etc/init.d/ntpd
pidfile="/var/run/ntpd.pid"
command="/usr/sbin/ntpd"
command_args_background="-p ${pidfile}"
command_args="${NTPD_OPTS}"
#start_stop_daemon_args="--pidfile ${pidfile}"
supervisor="supervise-daemon"
command_args_foreground="-n"

rngd

Rngd is the daemon belonging to sys-apps/rng-tools. It is meant to check and feed random data from a hardware device to the kernel random device.

Edit /etc/conf.d/rngd to bring it under supervision as follows:

FILE /etc/conf.d/rngd
supervisor="supervise-daemon"
command_args_foreground="--foreground"

sshd

sshd (Secure Shell Daemon) does not have a specific command-line option to run in foreground, instead it is needed to use the -D debug option. There may be some more text logged.

Edit /etc/init.d/sshd at the beginning of the file as follows:

FILE /etc/init.d/sshd
pidfile="${SSHD_PIDFILE}"
command_args_background="-o PidFile=${pidfile}"
command_args="${SSHD_OPTS} -f ${SSHD_CONFIG}"
command_args_foreground="-D"
supervisor="supervise-daemon"

# Wait one second (length chosen arbitrarily) to see if sshd actually
# creates a PID file, or if it crashes for some reason like not being
# able to bind to the address in ListenAddress (bug 617596).
#: ${SSHD_SSD_OPTS:=--wait 1000}
start_stop_daemon_args="${SSHD_SSD_OPTS}"

All the way at the bottom of the file there is the reload function in which the s-s-d instruction should be changed to a supervise-daemon instruction:

FILE /etc/init.d/sshd
reload() {
        checkconfig || return $?
        ebegin "Reloading ${SVCNAME}"
#       start-stop-daemon --signal HUP --pidfile "${pidfile}"
        ${supervisor} ${SVCNAME} --signal HUP --pidfile "${pidfile}"
        eend $?
}

syslog-ng

app-admin/syslog-ng is an interesting case. Without any change, running syslog-ng under s-s-d it looks like this:

root #ps -ef | grep syslog-ng
root      7800     1  0 09:16 ?        00:00:00 supervising syslog-ng
root      7802  7800  4 09:16 ?        00:03:56 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid

There is a process with in this case PID 7800 'supervising syslog-ng' with a parent-PID (PPID) of 1, which means its parent is the init process. There is also the process with PID 7802, which looks more like what we might expect, referencing the binary. This process' PPID is 7800, i.e. the supervising process.

The supervising syslog-ng process is actually also the same binary of syslog-ng:

root #pgrep -lf syslog
7800 syslog-ng
7802 syslog-ng

Syslog-ng appears to be supervising itself. What it does after startup is:

  • fork, to become a background daemon process (PID 7800) and get it to be adopted by the init process (PID 1);
  • fork again to create the worker process (PID 7802) as it's child process;
  • rename the process (PID 7800) to "supervise syslog-ng", in the parent process, and supervise its child (PID 7802).

If and when "supervise syslog-ng" detects that it's worker process (PID 7802) has terminated it will restart it. Syslog-ng calls this process mode "safe-background".

In order to get syslog-ng to work well under supervise-daemon it needs to run in the foreground though. There are two commandline options that will make that happen --foreground and --process-mode=foreground.

With the standard init scripts syslog-ng writes a pid file. This interferes with the operation of supervise-daemon so will have to be removed. Edit /etc/init.d/syslog-ng to remove the --pidfile option in the command_args, and comment out the pidfile variable:

FILE /etc/init.d/syslog-ng
command_args_background="--pidfile \"${SYSLOG_NG_PIDFILE}\""
command_args="--cfgfile \"${SYSLOG_NG_CONFIGFILE}\" --control \"${SYSLOG_NG_CONTROLFILE}\" --persist-file \"${SYSLOG_NG_STATEFILE}\" ${SYSLOG_NG_OPTS}"
supervisor="supervise-daemon"
extra_commands="checkconfig"
extra_started_commands="reload"
pidfile="${SYSLOG_NG_PIDFILE}"
description="Syslog-ng is a syslog replacement with advanced filtering features."
description_checkconfig="Check the configuration file that will be used by \"start\""
description_reload="Reload the configuration without exiting"
required_files="${SYSLOG_NG_CONFIGFILE}"
required_dirs="${SYSLOG_NG_PIDFILE_DIR}"
command_args_foreground="--foreground --process-mode=foreground"
command_user="${SYSLOG_NG_USER}:${SYSLOG_NG_GROUP}"

At the end of the file, the reload function needs to be changed to use the supervisor instead of s-s-d:

FILE /etc/init.d/syslog-ng
reload() {
        checkconfig || return 1
        ebegin "Reloading configuration and re-opening log files"
#       start-stop-daemon --signal HUP --pidfile "${pidfile}"
        ${supervisor} ${RC_SVCNAME} --signal HUP --pidfile "${pidfile}"
        eend $?
}

tor

The net-vpn/tor service requires multiple edits, but each is trivial. It can be brought into the foreground by changing the logical value passed with --runasdaemon. The pidfile assignment is to be deleted. Then command_args must be split into command_args and command_args_background as the general recipe explains. Finally replace any mentions of s-s-d with supervise-daemon.

FILE /etc/init.d/tor
#!/sbin/openrc-run
supervisor="supervise-daemon"
command=/usr/bin/tor
#pidfile=/run/tor/tor.pid
command_args_background="-p ${pidfile}"
command_args="--hush --runasdaemon 0"
retry=${GRACEFUL_TIMEOUT:-60}
stopsig=INT
command_progress=yes
extra_commands="checkconfig"
extra_started_commands="reload"
description="Anonymizing overlay network for TCP"
description_checkconfig="Check for valid config file"
description_reload="Reload the configuration"

checkconfig() {
	${command} --verify-config --hush > /dev/null 2>&1
	if [ $? -ne 0 ] ; then
		eerror "Tor configuration (/etc/tor/torrc) is not valid."
		eerror "Example is in /etc/tor/torrc.sample"
		return 1
	fi
}

start_pre() {
	checkconfig || return 1
	checkpath -d -m 0755 -o tor:tor /run/tor
}

reload() {
	checkconfig || return 1
	ebegin "Reloading Tor configuration"
	#start-stop-daemon -s HUP --pidfile ${pidfile}
	${supervisor} -s HUP --pidfile ${pidfile}
	eend $?
}

vixie-cron

The cron implementation offered by sys-process/vixie-cron can be brought under supervision by editing /etc/conf.d/vixie-cron to pass it -n to make it run in foreground:

FILE /etc/conf.d/vixie-cron
supervisor="supervise-daemon"
command_args_foreground="-n"

Services which won't run under a supervisor

Unfortunately not all services are easy to run under supervisor-daemon, or other supervisors. The requirement that the daemon needs to run in foreground is not satisfied with all daemons, or it simply does not work. Sometimes there are alternatives available.

One shot services

Some of the scripts in /etc/init.d/ that are started by OpenRC, do not start daemons. Instead they run a program that terminates when it has performed its function, or they do a configuration setting.

Examples of such, with the action at boot / shutdown, are:

  • alsasound: loads / saves volume settings for audio
  • localmount: mounts / unmounts file systems as per /etc/fstab
  • loopback: creates the loopback interface
  • swap: activates / deactivates swap devices
  • urandom: loads random seed and initializes /dev/urandom / saves random seed
  • zram-init: creates / destroys zram devices.

There is no need to run such services under a supervisor.

netifrc

The default Gentoo networking scripts belonging to net-misc/netifrc call s-s-d from under the hood. The scripts can start/stop services like dhcpcd, dhclient, pppd, wpa_supplicant when needed, and they use s-s-d for it.

Warning
Be careful not to mix and match different network management methodes because the results are unpredictable

dcron

sys-process/dcron version 4.5-r1 crashes without an error message when it is run under a supervisor. Consider an alternative like sys-process/fcron.

libvirtd

Libvirtd, the server side daemon component of the libvirt also crashes without error message when it is run under supervise-daemon.

Tips 'n tricks

A system under openrc-init and supervise-daemon behaves a little different. This chapter shows some of the differences and how to take advantage of it.

rc-status

rc-status shows the time when supervised services were started, and the number of restarts:

root #rc-status
Runlevel: default
 device-mapper                                                   [  started  ]
 syslog-ng                                           [  started 18:22:43 (0) ]
 dnsmasq                                             [  started 18:22:42 (0) ]
 sshd                                                [  started 00:00:27 (1) ]
 dbus                                                            [  started  ]
 alsasound                                                       [  started  ]
 bluetooth                                           [  started 18:09:29 (0) ]
 ntpd                                                [  started 17:06:48 (0) ]
 acpid                                               [  started 11:53:26 (0) ]
 avahi-daemon                                        [  started 11:51:00 (0) ]
 zram-init                                                       [  started  ]
 cupsd                                                           [  stopped  ]
 laptop_mode                                                     [  started  ]
 libvirtd                                                        [  started  ]
 libvirt-guests                                                  [  started  ]
 distccd                                                         [  started  ]
 busybox-httpd                                                   [  started  ]
 lxc-bridge                                                      [  started  ]
 fcron                                               [  started 18:22:42 (0) ]
 iwd                                                 [  started 18:22:42 (0) ]
 netmount                                                        [  started  ]
 local                                                           [  started  ]
 agetty.tty2                                         [  started 18:22:41 (0) ]
 agetty.tty3                                         [  started 18:22:41 (0) ]
 agetty.tty4                                         [  started 18:22:41 (0) ]
 agetty.tty5                                         [  started 18:22:41 (0) ]
 agetty.tty6                                         [  started 18:22:41 (0) ]
 agetty-autologin.tty1                               [  started 18:22:41 (0) ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
 dhcpcd                                                          [  started  ]
 virtlogd                                                        [  started  ]
Dynamic Runlevel: manual

Note the line for sshd, which shows that it was recently restarted by its supervisor. With rc-status -S only supervised services are displayed.


External resources