smf(5) design admissions

In a Usenet thread on comp.unix.solaris and an earlier comment series in the “Comparing Linux to System VR4” on Slashdot last week, some valid smf(5) criticisms have been raised. I thought I would reply to them here, and acknowledge the impact on some practices.

The strongest recent critic has been Tim Hogard, and I hope he doesn’t mind if I collect his concerns to structure my response. Mr Hogard is a member of the class of exceptional administrators that have designed their systems’ behaviours (by removing unneeded software and writing precise configurations for what remains, among other good practices). I’ve always known that this group of experts would be most directly impacted by the changes smf(5) brings, and have tried to listen carefully to their concerns and modify the implementation to accommodate them. (Such people exist within Sun and in the Beta and Express communities.)

“On my system the binary file contains the last time when the service started. That means a file in /etc gets written with every boot.” There are actually two binary files that contain the set of information the facility acts on:

  • /etc/svc/repository.db, which is persistent and contains the service definitions, dependencies, and so on, and
  • /etc/svc/volatile/svc_nonpersist.db, which is non-persistent and contains service execution information, such as process IDs, contract IDs, and stae transition times.

(/etc/svc/volatile is a tmpfs filesystem mounted by the kernel prior to startup being turned over to the progeny of init(1M). Its contents do not persist across OS instantiations.)

I believe the concern here is that /etc/svc/repository.db change detection cannot be managed using fingerprints. This is in fact a bug:

6221934 *svc.configd* must respect repository filesystem metadata as the current implementation does an idempotent test transaction to verify writeablity (in addition to other integrity checks). In the meanwhile, one can indirectly use a hash- or checksum-based fingerprint using the output of the archive subcommand to svccfg(1M), which dumps the repository as an XML document. An example using cksum(1):

13 $ svccfg archive > /tmp/a
14 $ svcadm disable manifest-import
15 $ svccfg archive > /tmp/b
16 $ svcadm enable manifest-import
17 $ svccfg archive > /tmp/c
18 $ cksum /tmp/a /tmp/b /tmp/c
1649351107      205437  /tmp/a
2659093841      205438  /tmp/b
1649351107      205437  /tmp/c
19 $ # use diff to examine repository differences globally…
But we’ll get that bug fixed shortly.

** “I’ve found some very interesting things to do to the new system that can mess up a box in a way that the current tool set won’t even let you see what is wrong. It appears to be a script kiddies back door dream.”** I am less certain I understood this point. There are no hidden services in the repository, although you can certainly corrupt the repository deliberately. (Separate backup repositories are made automatically at stages of boot; a recovery utility is provided in /lib/svc/bin.) Because each process is a member of a process contract owned ultimately by svc.startd(1M), you can use svcs -p if you trust the repository and ptree -c if you don’t. (If you don’t trust the kernel any longer, then we’re onto intrusion containment and system reinstallation from trusted media.)

** “And yes I’m saying complexity is bad and we should stick with what works (and it isn’t DOS).”** Unfortunately, what worked once no longer works given today’s requirements. The Predictive Self-Healing initiative at Sun and equivalents elsewhere are responses to a change in availability requirements: both hardware and software components have the potential for failure, and the operating system needs to acknowledge this outcome and provide abstractions and capabilities to manage its implications. It’s my belief that smf(5) introduces a minimal amount of complexity in exchange for new and meaningful descriptive objects and, in fact, the simplification of some previously difficult operations.

This reply hasn’t explained why there’s a transactional database (which, as an implementation detail, is currently SQLite), or why hierarchical restart is a different problem than parallel startup, or why we converted the current system to use the facility rather than providing only the framework. I’m happy to expand on any of these, or the points above.