libuutil and designing for debuggability

Going into Solaris 10, I knew we were planning to develop a troupe of new daemons; we ultimately ended up with svc.startd(1M), svc.configd(1M), and a new implementation of inetd(1M). I wanted to make sure we made some progress on daemon implementation practice, and bounced some ideas around with the afternoon coffee group and also with Mike, and probably some others—I wander around a bit.

We anticipated that most of the daemons would be multithreaded, and it became apparent that they would all present large, complicated images for postmortem debugging1. To reduce the time to acquire familiarity with each of these daemons, we worked out three common requirements:

  • include Compact C Type Format (CTF) information with each daemon,
  • use libumem(3LIB) for memory allocation, and
  • use standard, debuggable, MT-safe implementations of data structures.

The problem was, of course, that there wasn’t a library with such data structures in Solaris at the time.2, 3. So we began to design libuutil, which combines a number of established utility functions used in authoring Solaris commands with these new “good” implementations of useful data structures.

The library in question was named in sympathy with libumem(3LIB)—libuutil for “userland utility functions”. libuutil provides both a doubly linked list implementation and an AVL tree implementation. The list implementation is mostly located in lib/libuutil/common/uu_list.c; we’ll use that to explore the debugging assistance we designed in.

The model used is that each program is likely to have multiple lists of common structures, and that there would be multiple such structures. This led us to create an interface that is expressed in terms of pools of list. So, for each structure, you create a list pool using uu_list_pool_create(). Then, for each list of that structure, you create a list in the respective pool using uu_list_create().

That sounds complicated, but it’s for a good reason: at each call to uu_list_pool_create(), we register the newly created pool on a global list, headed by the “null pool”, uu_null_lpool:

uu_list_pool_t *
uu_list_pool_create(const char name, size_t objsize,
size_t nodeoffset, uu_compare_fn_t *compare_func, uint32_t flags)
{
uu_list_pool_t *pp, *next, *prev;
/ validate name, allocate storage, initialize members */
(void) pthread_mutex_init(&pp->ulp_lock, NULL);
pp->ulp_null_list.ul_next = &pp->ulp_null_list;
pp->ulp_null_list.ul_prev = &pp->ulp_null_list;
(void) pthread_mutex_lock(&uu_lpool_list_lock);
pp->ulp_next = next = &uu_null_lpool;
pp->ulp_prev = prev = next->ulp_prev;
next->ulp_prev = pp;
prev->ulp_next = pp;
(void) pthread_mutex_unlock(&uu_lpool_list_lock);
return (pp);
}

with similar code being used to connect each list to its pool on calls to uu_list_create().

So now we have an address space where each list pool is linked in a list, and each list in a pool is linked to a list headed at that pool. This leads us to the second part, which is to use the encoded information in a debugger. The typical debugger for kernel work in Solaris is mdb(1), the modular debugger. It’s been shipping with Solaris since 5.8, and has a rich set of extensions for kernel debugging. For userland, the modules are rarer: libumem is probably the best known.4

The source code for the libuutil module (or “dmod”) is located at cmd/mdb/common/modules/libuutil/libuutil.c; the function that provides the dcmd itself, uutil_listpool, is just a wrapper around the walker for uu_list_pool_t structures. The pertinent portion is the initialization function, uutil_listpool_walk_init():5

int
uutil_listpool_walk_init(mdb_walk_state_t *wsp)
{
uu_list_pool_t null_lpool;
uutil_listpool_walk_t *ulpw;
GElf_Sym sym;
bzero(&null_lpool, sizeof (uu_list_pool_t));
if (mdb_lookup_by_obj("libuutil.so.1", "uu_null_lpool", &sym) ==
-1) {
mdb_warn("failed to find 'uu_null_lpool'\n");
return (WALK_ERR);
}
if (mdb_vread(&null_lpool, sym.st_size, (uintptr_t)sym.st_value) ==
-1) {
mdb_warn("failed to read data from 'uu_null_lpool' address\n");
return (WALK_ERR);
}
ulpw = mdb_alloc(sizeof (uutil_listpool_walk_t), UM_SLEEP);
ulpw->ulpw_final = (uintptr_t)null_lpool.ulp_prev;
ulpw->ulpw_current = (uintptr_t)null_lpool.ulp_next;
wsp->walk_data = ulpw;
return (WALK_NEXT);
}

which safely pulls out the value of the uu_null_pool head element, and the relevant pieces we’ll need to walk the list.

This means that, for any program linked with libuutil, we can attach with mdb(1M) and display its list pools:

mdb -p pgrep -z global startd

Loading modules: [ svc.startd ld.so.1 libumem.so.1 libnvpair.so.1 libsysevent.so.1 libuutil.so.1 libc.so.1 ]

::uu_list_pool ADDR NAME COMPARE FLAGS 080dcf08 wait_info 00000000 D 080dce08 SUNW,libscf_datael 00000000 D 080dcd08 SUNW,libscf_iter 00000000 D 080dcc08 SUNW,libscf_transaction_entity c2b0476c D 080dc808 dict 0805749c D 080dc908 timeouts 0806ffab D 080dca08 restarter_protocol_events 00000000 D 080dcb08 restarter_instances 0806ccd7 D 080dc708 restarter_instance_queue 00000000 D 080dc608 contract_list 00000000 D 080dc508 graph_protocol_events 00000000 D 080dc408 graph_edges 00000000 D 080dc308 graph_vertices 08059844 D

and then drill down into constituent lists of interest.

Additional walkers are also provided, such that the lists and list nodes can be visited from the command line or programmatically. As an example, the ::vertex dcmd from the svc.startd module uses the walkers to display the various service graph nodes in a quasi-readable format.5

So, by providing extra structured information in the library and support to consume that information in the debugger, we end up with a set of data structures that, if used, leads to more debuggable programs. More work up front for less later: welcome to OpenSolaris.

Footnotes

1. By postmortem debugging, I’m referring to the operation of debugging a failed application after its failure, from a core file or other memory image captured as soon after that failure as possible. Suitability for postmortem debugging is a standard expectation for software design in Solaris, as it reduces the time to diagnose and fix software failures. In particular, multiple engineers can debug a core file in parallel; this can be contrasted with the cost of setting up a duplicate installation and trying to reproduce the failure, let alone expecting the customer to risk further downtime experimenting with “try this” scenarios.

2. Please remember that we were making these decisions three years ago, and that this choice had to fit the then-applicable constraints on the product.

3. In contrast, the kernel has had a generic, modular hash table since 5.8/2000 (uts/common/os/modhash.c), a generic AVL tree since 5.9/2002 (common/avl/avl.c), and a generic list implementation early in 5.10/2005 (uts/common/os/list.c). Of course, the kernel has used the slab allocator (uts/common/os/kmem.c) since 5.4/1994.

4. A quick listing in /usr/lib/mdb/proc/ will display the other modules valid in the process target: beyond libumem and libuutil, there’s support for the linker, libc, name-value pairs, system event, and the two main smf(5) daemons.

5. As an example, here’s the output of “::vertex on my current system, for those services related to my VNC server (and the service itself):

::vertex ! grep vnc 0x85d3380 212 I 1 svc:/application/vncserver:sch 0x85d3320 213 s - svc:/application/vncserver 0x85d3200 214 R - svc:/application/vncserver:sch>milestone 0x85d3260 215 R - svc:/application/vncserver:sch>autofs 0x85d32c0 216 R - svc:/application/vncserver:sch>nis

[ T: OpenSolaris Solaris mdb ]