Why predictable?

I titled my blog “Predictable” more out of genuine interest about making systems more so, than any particular cynicism about how things (any things) turn out. We have a pretty established language about desirable system attributes: reliability, availability, serviceability, and performance. Predictability is a courtier of all of these, but really doesn’t dominate any of them–it’s almost a separate quality.

We’ve been working on resource management for a while now, with an eye to allowing the construction of more predictable servers. Most of this work has been adding various mechanisms to the OS to prevent resource denial of service opportunities, or mechanisms that reserve or fairly schedule resources among competing services, such that the system can respond fairly smoothly until we reach some level of overcommitment. Now that we’ve got the basic mechanisms (and more are coming), we can start to examine how we can layer automation on top of them, and push out the boundary where true overcommitment occurs, without administrator intervention.

The current edition of Solaris Express contains the first of these features, dynamic resource pools, which you can use in combination with zones to have your consolidated systems smoothly react and reassign processors based on system load and relative importance. (You can also use the fair share scheduler with zones, if you don’t need to reserve or cap some absolute amount of processing capability for each zone).

We now have a system that can respond elastically across a wider range of load scenarios, without compromising some minimal expectations regarding quality of service with respect to one resource. Our solution, however, can handle multiple resources, and we’ll see how that “predictability product” allows you to run your systems closer to maximal utilization in the future.

There are lots of problems in this space, connected with specific operating system resources (and the notion of a resource itself), how resources map from one layer in a stack to the next, how end-to-end scenarios play out within a participating host, … But it’s clear that predictability engineering is a piece of operating systems development–and, if you can get some time to think about it, it’s a lot of fun.