Bespoke services: site/supervisord

Recently, I’ve been experimenting with supervisor, which is a Python-based process restarter for Unix/Linux. Lincoln Loop recently offered instructions on running supervisor under upstart, which is applicable to some of the current Linux distributions. On OpenSolaris and related systems, the service management facility, smf(5), can be used to ensure your supervisors stay online. Below is a simple manifest that starts (and restarts) supervisord after a small set of services becomes available.

If you don’t provide a supervisor.conf in one of the standard locations, enabling this service instance will result in it heading immediately to the maintenance state, as the start method will fail repeatedly. You can use svcs -x to perform this diagnosis:

$ svcs -x                                                         ~
svc:/site/supervisord:default (supervisor process control system)
 State: maintenance since Sat Sep 04 16:34:14 2010
Reason: Start method failed repeatedly, last exited with status 2.
   See: http://sun.com/msg/SMF-8000-KS
   See: utmpd(1M)
   See: utmpx(4)
   See: /var/svc/log/site-supervisord:default.log
Impact: This service is not running.

The log file will contain a message, with some amount of repetition, like

[ Sep  4 16:34:13 Enabled. ]
[ Sep  4 16:34:13 Rereading configuration. ]
[ Sep  4 16:34:13 Executing start method ("/usr/bin/supervisord"). ]
Error: No config file found at default paths (/usr/etc/supervisord.conf, /usr/supervisord.conf, supervisord.conf, etc/supervisord.conf, /etc/supervisord.conf); use the -c option to specify a config file at a different path
For help, use /usr/bin/supervisord -h
[ Sep  4 16:34:13 Method "start" exited with status 2. ]

It’s worth noting that all of the programs run by a single instance of supervisord will be in the same process contract. If you know the fault characteristics of your programs, you may wish to use multiple instances of supervisord to keep programs with “sympathetic” failure modes and frequencies. You may also need to ignore core dumps and external signals, depending on the programs you are running; on recent systems, you can see /var/svc/manifest/network/http-apache22.xml for an example of a startd property group that does so. Alternatively, you could modify your configuration to run each of the programs to be started in independent contracts using ctrun(1).

Exercises

  1. We should really provide a property group that contains the key invocation settings as properties. I’ve omitted it here, particularly for the configuration file, because the method token expansion outlined in smf_method(5) lacks handling for unset property values. (*)
  2. Extend supervisord to understand process contracts. This exercise would include constructing a Python module to interact with the contract filesystem. (***)