A not-quite-isochronous upgrade

In the midst of many pieces of work, a new machine showed up on my doorstep. It’s pretty sweet, and I wanted to get it running latest Solaris Nevada bits. So I repartitioned the disk to make room for an additional boot environment (BE), updated the Live Upgrade support, upgraded (to Build 44), and got back to work.

But this week I saw something pretty strange. I was working on making some software more portable, updating some Makefiles as well as changing some of the source. On a rebuild, my gcc compilation failed with

collect2: ld terminated with signal 9 [Killed]
ld.so.1: ld: fatal: libld.so.4: version `SUNWprivate_4.2' not found (required by file /usr/ccs/bin/ld)
ld.so.1: ld: fatal: libld.so.4: open failed: No such file or directory
ld.so.1: ld: fatal: relocation error: file /usr/ccs/bin/ld: symbol ld32_main: referenced symbol not found

After checking with a few neighbours about any recent linker fixes, and reviewing package installation times, we finally started looking inside the binaries. The version from the install server showed Build 44:

$ mcs -p reloc/usr/ccs/bin/ld
reloc/usr/ccs/bin/ld:
@(#)SunOS 5.11 snv_44 October 2007

But the version installed on the machine was from the future:

$ mcs -p /usr/ccs/bin/ld
/usr/ccs/bin/ld:
@(#)SunOS 5.11 snv_45 October 2007

What were we running on this box?

$ cat /etc/motd
Sun Microsystems Inc.   SunOS 5.11      snv_44  October 2007

I suppose if this were an Encyclopedia Brown mystery, you would have to flip to the end of the book to discover the solution to the mystery. Unfortunately, I haven’t given you quite enough information–you need to know how our install servers share new images.

As you might expect, there is a large set of install servers available across Sun. Some are run by IT operations, some are in development labs—but all pull their images across at different times. It turns out that the most recent build is linked to a directory called “latest“, so that one can reinstall a system every two weeks and have it running the most recently assembled version of Solaris.

Generally, the action to get a copy of the newest image and update the links happens automatically, and outside of business hours. But sometimes in a development lab, someone wants an image early, perhaps to upgrade a collection of test machines for verification purposes. And, it turns out, luupgrade(1M) works just fine across such a rename, if it’s not specifically accessing the filesystem during the image changeover. So my Live Upgrade installed a swathe of packages from 44 followed by a smattering from 45.

One more Live Upgrade (using the numbered path, rather than latest) to Build 45, and it’s back to porting.

This problem isn’t going to be seen very often—particularly if you don’t have a multi-version install server setup like we do—but it seems appropriate to document the underlying issue: Live Upgrade doesn’t detect an image change during the operation. I’m not sure it should, but it’s pretty clear that the underlying package operations are incomplete with respect to versioned dependencies.

[ T: OpenSolaris Solaris install luupgrade ]