adirent.[ch]: Adding d_type to struct dirent on OpenSolaris

An occasional porting problem you may encounter when compiling programs for OpenSolaris is the absence of d_type in the directory entry structure returned by readdir(3C). I hit this issue when experimenting with mu as a search solution for my accumulated email.

A trivial example of the failure you might see would be caused by the following program:

#include <sys/types.h>
#include <dirent.h>
#include <err.h>
#include <stdio.h>

int
main(int argc, char *argv[])
{
        DIR *d;
        struct dirent *e;

        if ((d = opendir("/")) == NULL)
                err(1, "opendir failed");

        for (e = readdir(d); e != NULL; e = readdir(d)) {
                    if (e->d_type != DT_UNKNOWN)
                        (void) printf("recognized filetype for '%s'\n",
                            e->d_name);
        }

        (void) closedir(d);

        return (0);
 }

When we attempt to compile this program with gcc, we get something like

$ gcc a.c
a.c: In function `main':
a.c:16: error: structure has no member named `d_type'
a.c:16: error: `DT_UNKNOWN' undeclared (first use in this function)
a.c:16: error: (Each undeclared identifier is reported only once
a.c:16: error: for each function it appears in.)

Studio cc will give similar output:

$ /opt/SunStudioExpress/bin/cc a.c
"a.c", line 16: undefined struct/union member: d_type
"a.c", line 16: undefined symbol: DT_UNKNOWN
cc: acomp failed for a.c

The addition of d_type to struct dirent came first for the BSD Unixes and was later added to Linux. Because it’s not easy to add members to well-known structures and preserve binary compatibility, OpenSolaris and Solaris lack this field, as well as the DT_* constant definitions. (If d_type were to become part of the Unix standards, Solaris would likely have to introduce a second family of opendir()/readdir()/closedir() functions and a second version of the structure, similar to how large files were introduced for 32-bit programs.)

Because we fail at compilation time, our workaround has to modify either the program’s source code or its build environment. (Preloading is too late.) It’s probably possible to combine a few definitions and a shared object that we include via LD_PRELOAD but it seems easier to just provide a C wrapper around readdir(3C) and an alternate struct dirent. We develop this approach in the next section.

DIRENT and READDIR

The approach we take is

  1. Introduce DIRENT and READDIR via adirent.h.
  2. Change the source program such that each call to readdir() is replaced by READDIR() and each use of struct dirent is replaced by DIRENT. In each file so modified, add a #include <adirent.h>.
  3. Compile adirent.c via gcc -I. -O2 -c adirent.c or equivalent.
  4. Add adirent.o to the link line for each binary that includes one of the files modified in step 2.

If we apply these steps to our example above, we get

#include <sys/types.h>
#include <adirent.h>
#include <err.h>
#include <stdio.h>

int
main(int argc, char *argv[])
{
        DIR *d;
        DIRENT *e;

        if ((d = opendir("/")) == NULL)
                err(1, "opendir failed");

        for (e = READDIR(d); e != NULL; e = READDIR(d)) {
                    if (e->d_type != DT_UNKNOWN)
                        (void) printf("recognized filetype for '%s'\n",
                            e->d_name);
        }

        (void) closedir(d);

        return (0);
 }

with the result that compilation and execution now work

$ gcc -O2 -I. -c adirent.c
$ gcc -I. a.c adirent.o
$ ./a.out

This shim function and definitions should be sufficient for most ports around this incompatibility, but there are some additional comments worth making.

Performance. Because many programs expect d_type to be one of DT_REG or DT_DIR to save on a stat(2) call, this shim will force those programs into an alleged “slow” path. The actual impact of returning DT_UNKNOWN on every call will be program- and situation-dependent; it didn’t seem to affect my mail indexing.

Multithreaded programs. The current implementation does not protect the static structure defined in adirent.c. Programs with multiple threads performing readdir(3C) calls through READDIR() will get unexpected results. It should be relatively straightforward to dynamically allocate one struct adirent for each thread coming through READDIR() for the first time.

Downloads

I suppose these should be in a repository on Bitbucket or GitHub. For now, they’re just simple downloads:

Acknowledgments

I discussed this problem with Dan, who in particular noted that DT_UNKNOWN was always a legal return value for d_type. Bart looked over my shoulder and spied at least one error during the debugging phase.