Gates, projects, and developers in ON

February 28, 2006 | Stephen | Process.

If you’re following the tools-discuss alias on opensolaris.org, then you’re also aware that the ON consolidation uses a distributed source code management tool to track changes to the ON source (which includes the kernel as well as many device drivers, libraries, and commands). I thought I would review the current relationships between code repositories in typical development scenarios in ON, in part to practice using tool-neutral terminology in describing how code changes move from a developer into the main repository for the development release, but also to reveal some of the assumptions behind the distributed source code requirements and evaluation.

To stay neutral, the various invocations of a fictional ‘scm’ tool below are hypothetical; different SCMs might treat some of these command sequences as a more monolithic operation or might provide even more primitive operations.

For reference of the relationships we discuss below, here’s a simple diagram, in which the arrows indicate the flow of code changes. The two shaded repositories are the first to receive changes from their respective children, which we’ll call “gates”. (The unshaded repositories receive their changes from a gate or via a child pulling from a gate.)

Let’s consider first an individual developer addressing one or more bugs in ON. He or she will need a copy of the source to work on, which we choose to store in /path/to/fix-0.

$ scm clone /path/to/onnv-clone /path/to/fix-0

For ON, this clone (or initial pull or initial bringover or initial checkout) operation involves the creation of a little over 34 000 files and nearly 7 000 directories.

Although we use file system paths in these examples, most of the candidates can perform their operations over the network, via `ssh`, `http`, or a native protocol.

The developer then makes his or her changes to the files requiring correction. During that period of time—editing, compiling, and testing, as well as devising a fix—the user may make intermediate commits of their current work and pull new changes (made by the other ON committers—let’s say a thousand). This operation looks like

$ scm commit
$ scm pull
$ scm merge

with all operations going to onnv-clone.

Finally the developer is ready for final integration. Their code has been tested, code reviewed, and an integration request has been approved. Their final sequence is

$ scm reparent /path/to/onnv-gate
$ scm sanitize
$ scm commit
$ scm pull
$ scm merge

until they are completely up-to-date with respect to onnv-gate, at which point, they can publish their changes

$ scm push

The “sanitize” operation removes all of the intermediate commits made in the developer’s repository—from merges made along the way—so that the main gate only sees the important portion of the history of their work: what specific defects, via bug IDs, are addressed by this changeset.

`sanitize` isn’t typically implemented in source code management tools; it is a local addition that might be seen as controversial in some circles. ON isn’t a pull- or patch-driven integration process, so if we didn’t sanitize, we would have commit logs with the merge records resulting from 1 000 developers merging at different rates.

At this point, it’s probably worth commenting on the existence of the onnv-clone repository, which is provided as a read-only child of the main development repository, onnv-gate. All of the operations in our scenario would have been sensible if we had used onnv-gate as our parent repository (and omitted our reparent operation). So why have a ‘clone’ at all? There are two reasons, one technical and one social.

The technical reason is to make sure that contention for the gate’s repository locking is limited to developers integrating code. Integrating into ON carries with it a substantial burden of testing, watching for other integrations, rebuilding, and so on: getting blocked by an initial pull of the gate repository taking the various repository locks is a difficult scenario we want to avoid.

The social reason is that the clone represents a steadying rhythm for developers: unless you are preparing to integrate that day, you know that the pull/merge you do from the clone is, on some best effort basis, safe. Small build mistakes have been fixed and large ones undone prior to potentially impacting your development repositories.

So a clone is a useful technique when a DSCM’s locking might be overwhelmed by the number of child repositories needing to be kept up-to-date.

Our second scenario, where a few developers are pursuing a project that involves a large batch of coordinated changes to ON, is an example where a clone repository isn’t necessary, as the three developers are unlikely to contend on the repository lock. But the role of project-gate is similar: to provide a repository which is regularly built and whose binary products are regularly tested (and may even be shared with potential users or act as parents to other projects’ gates), so that the developers can make confident forward progress.

One of the developers (or more) will, on some schedule, do pulls and merges from onnv-clone and integrate them into project-gate. This “resync” can be done from the project gate itself or (and more preferably, but not illustrated) from an additional child. In the latter case, this sequence looks like

$ scm parent
/path/to/project-gate
$ scm pull /path/to/onnv-clone
$ scm merge
$ # test build, corrections, with possible a commit/pull/merge
$ scm push

where we have pulled from onnv-clone to push to our project repository.

When the development team is ready to integrate their changes, their invocations from the project-gate repository are similar to their colleague’s from fix-0 above:

$ scm reparent /path/to/onnv-gate
$ scm sanitize
$ scm commit
$ scm pull
$ scm merge
$ scm push

For a project, this action is usually followed by the consumption of one or more celebratory beverages, while the most nervous member watches for mail from one or more gatekeepers regarding build nits or more serious omissions. We’ll have to work on collaboration tools so that this team experience can be shared across the network.

[ T: OpenSolaris Solaris scm ]

If you found this article interesting, you might consider following Stephen on Twitter.