Process supervision

Process-Supervision is the ability to manage (long lived) processes or rather daemons and be able to get (automated) process restart if need be, be it a process crash or signal mis-use. There are curently four well known implementations using the same API: Daemontools, Daemontools-encore, Runit and S6. The two later suites can also be used as init PID 1 replacement; Although the latter case, S6 that is, is left to the distribution or Operating System implementation.

Rationale
There is certainly the need of process management and supervision in order to ensure the availability of certains functionalities in the Openrating System. Without daemontools[-encore], runit and s6 supervision model, this is done with some dirty and less dirty hacks which involve managing PID files of (child) processes to be able to start/stop (child) processes when necessary. Current process management implementation&mdash;start-stop-daemon (ssd for short)&mdash;in OpenRC uses this scheme whith some known flaws like positive false PID aquisition with a kind of racy start up.

This is where enter process supervision which normaly have a direct feedback link with child process. Well, the daemontools API supervision family start child processes in the foreground,&mdash;instead of the background for ssd implementation,&mdash;for this end with a foreground (hack) utility is used if necessary,&mdash;usualy named fghack,&mdash;to achieve this feast with (bad) daemons.

However, the supervision advocates tend to advocate a complete system supervision, meaning that, every daemon in the system is supervised. Does this really fit well in every use case? Or is it safe to supervise every daemon in every environment? Even in a server oriented one? Supervision advocates would say yes. And the process supervisor overhead seems to be a non issue because it has small resources foot print.

See the end of the article for an OpenRC supervision backend... still being worked on.

Damontools
Daemontools

Daemontools-encore
See Daemontools-encore main article for more info.

Runit
See Runit main article for more info.

S6
See S6 main article for more info.

Supervisor
Python folks intake on supervision following daemontools steps.

Supervisor

Supervision Scripts
A supervision scripts framework inspired by the original supervision-scripts, see, by Avery Payne is available. This framework aims to get a KISS supervision suite which work out of the box with an almost empty service directory and log directory: a symlink of sv/SERVICE/run to sv/.opt/run and sv/SERVICE/log/run to sv/.opt/run-log would be enough for most cases. Similarly, getty,&mdash;be it agetty, mingetty or fgetty (the latter would require either an edit of the default configuration, or adding a sv/SERVICE/OPTIONS configuration file)&mdash;work out of the box.

All in all, easy and no environment variable (file) x local (sv/SERVICE/env) & global (sv/.env) environment directory x (number of) services useless disk seeks! A single OPTIONS configuration file for each service is prefered here&mdash;with a default (sv/.opt/SVC_OPTIONS) configuration file.

OpenRC friendly Runit stage 1, 2 and 3 along with a ctrlaltdel files are available as well, notwithstanding the compatibilty mode offered by setting RC_OPTS=Yes in OPTIONS configuration file. Some S6 experimental stage 1, 2 and 3 are available as well.

Tired to (re-)write the same (./run) thing over and over again?! Check this out! Do not want to wear out the (system boot) disk for no good reason?! Try it out!

Finaly, there is an environment variables file sv/.opt/SVC_BACKEND which contains supervision backend commands and signal that can be used to write a generic supervisor backend for OpenRC&mdash;generic, here, means daemontools family generic API (runit & s6 included.) See the following sub-section for more info on an OpenRC supervisor backend.

OpenRC Supervision Backend
There is a Runit backend for OpenRC experiment on BGO, see external ressources. The major blocker issue is starting a service in a race free conditions in a timely manner and be able to report success or failure of said service which does not fit quite well on the scan service directory (/service/) model without races. This is true at least for Runit. Maybe putting a stop file in the service directory /service/SERVICE/stop can help to start a service in down state and be able to send a start or rather up command. Still... this has to be tested.

OpenRC Supervision Service
This is rather easy to do than the previous supervisor backend, very easy in fact. Simple local start/stop scripts for OpenRC can be written quickly for this end. Or else, a more elaborate init service script could be written and added to possibly boot or default run level.

See Supervision Scripts Framework on External Resources for such (stage-2 alike) script and/or Runit article for a local init service script variant.

Local Service
See Runit for an example.

System Service
A generic system service can be done in a few seconds, and thus, be used for any supervision backend along the lines of the following.

A short configuration file is necessary to select a supervision backend to use.

See Runit for another implementation.

External Resources

 * OpenRC friendly Supervision Scripts Framework
 * Avery Payne's Supervision-Scripts Collection
 * Init & Supervision Forums Thread
 * Runit Supervisor Backend for OpenRC
 * Runit Forums Thread with Supervision discussion
 * Socket Activation Requirement?
 * Why Process Supervision (at all?)