Process supervision

Process-Supervision is the ability to manage (long lived) processes or rather daemons and be able to get (automated) process restart if need be, be it a process crash or signal mis-use. There are curently four well known implementations using the same API: Daemontools, Daemontools-encore, Runit and S6. The two later suites can also be used as init PID 1 replacement; Although the latter case, S6 that is, is left to the distribution or Operating System implementation.

Rationale
There is certainly the need of process management and supervision in order to ensure the availability of certains functionalities in the Openrating System. Without daemontools[-encore], runit and s6 supervision model, this is done with some dirty and less dirty hacks which involve managing PID files of (child) processes to be able to start/stop (child) processes when necessary. Current process management implementation&mdash;start-stop-daemon (ssd for short)&mdash;in OpenRC uses this scheme whith some known flaws like positive false PID aquisition with a kind of racy start up.

This is where enter process supervision which normaly have a direct feedback link with child process. Well, the daemontools API supervision family start child processes in the foreground,&mdash;instead of the background for ssd implementation,&mdash;for this end with a foreground (hack) utility if necessary,&mdash;usualy named fghack,&mdash;to achieve this feast with (bad) daemons.

However, the supervision advocates tend to advocate a complete system supervision, meaning that, every daemon in the system is supervised. Does this really fit well in every use case? Or is it safe to supervise every daemon in every environment? Even in a server oriented one? Supervision advocates would say yes. And the process supervisor overhead seems to be a non issue because it has small resources foot print.

See the end of the article for an OpenRC supervision backend... still being worked on.

Damontools
Daemontools

Daemontools-encore
See Daemontools-encore main article for more info.

Runit
See Runit main article for more info.

S6
See S6 main article for more info.

Supervisor
Supervisor

OpenRC backend(s)
There is a Runit backend for OpenRC experiment on BGO, see external ressources. The major blocker issue is starting a service in a race free conditions in a timely manner and be able to report success or failure of said service which does not fit quite well on the scan service directory (/service/) model without races. This is true at least for Runit. Maybe putting a stop file in the service directory /service/SERVICE/stop can help to start a service in down state and be able to send a start or rather up command. Still... this has to be tested.