adirent.[ch]: Adding d_type to struct dirent on OpenSolaris
An occasional porting problem you may encounter when compiling programs for OpenSolaris is the absence of d_type
in the directory entry structure returned by readdir
(3C). I hit this issue when experimenting with mu as a search solution for my accumulated email.
A trivial example of the failure you might see would be caused by the following program:
#include <sys/types.h>
#include <dirent.h>
#include <err.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
DIR *d;
struct dirent *e;
if ((d = opendir("/")) == NULL)
err(1, "opendir failed");
for (e = readdir(d); e != NULL; e = readdir(d)) {
if (e->d_type != DT_UNKNOWN)
(void) printf("recognized filetype for '%s'\n",
e->d_name);
}
(void) closedir(d);
return (0);
}
When we attempt to compile this program with gcc
, we get something like
$ gcc a.c
a.c: In function `main':
a.c:16: error: structure has no member named `d_type'
a.c:16: error: `DT_UNKNOWN' undeclared (first use in this function)
a.c:16: error: (Each undeclared identifier is reported only once
a.c:16: error: for each function it appears in.)
Studio cc
will give similar output:
$ /opt/SunStudioExpress/bin/cc a.c
"a.c", line 16: undefined struct/union member: d_type
"a.c", line 16: undefined symbol: DT_UNKNOWN
cc: acomp failed for a.c
The addition of d_type
to struct dirent
came first for the BSD Unixes and was later added to Linux. Because it’s not easy to add members to well-known structures and preserve binary compatibility, OpenSolaris and Solaris lack this field, as well as the DT_*
constant definitions. (If d_type
were to become part of the Unix standards, Solaris would likely have to introduce a second family of opendir()
/readdir()
/closedir()
functions and a second version of the structure, similar to how large files were introduced for 32-bit programs.)
Because we fail at compilation time, our workaround has to modify either the program’s source code or its build environment. (Preloading is too late.) It’s probably possible to combine a few definitions and a shared object that we include via LD_PRELOAD
but it seems easier to just provide a C wrapper around readdir
(3C) and an alternate struct dirent
. We develop this approach in the next section.
DIRENT
and READDIR
The approach we take is
- Introduce
DIRENT
andREADDIR
viaadirent.h
. - Change the source program such that each call to
readdir()
is replaced byREADDIR()
and each use ofstruct dirent
is replaced byDIRENT
. In each file so modified, add a#include <adirent.h>
. - Compile adirent.c via
gcc -I. -O2 -c adirent.c
or equivalent. - Add
adirent.o
to the link line for each binary that includes one of the files modified in step 2.
If we apply these steps to our example above, we get
#include <sys/types.h>
#include <adirent.h>
#include <err.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
DIR *d;
DIRENT *e;
if ((d = opendir("/")) == NULL)
err(1, "opendir failed");
for (e = READDIR(d); e != NULL; e = READDIR(d)) {
if (e->d_type != DT_UNKNOWN)
(void) printf("recognized filetype for '%s'\n",
e->d_name);
}
(void) closedir(d);
return (0);
}
with the result that compilation and execution now work
$ gcc -O2 -I. -c adirent.c
$ gcc -I. a.c adirent.o
$ ./a.out
This shim function and definitions should be sufficient for most ports around this incompatibility, but there are some additional comments worth making.
Performance. Because many programs expect d_type
to be one of DT_REG
or DT_DIR
to save on a stat
(2) call, this shim will force those programs into an alleged “slow” path. The actual impact of returning DT_UNKNOWN
on every call will be program- and situation-dependent; it didn’t seem to affect my mail indexing.
Multithreaded programs. The current implementation does not protect the static structure defined in adirent.c
. Programs with multiple threads performing readdir
(3C) calls through READDIR()
will get unexpected results. It should be relatively straightforward to dynamically allocate one struct adirent
for each thread coming through READDIR()
for the first time.
Downloads
I suppose these should be in a repository on Bitbucket or GitHub. For now, they’re just simple downloads:
Acknowledgments
I discussed this problem with Dan, who in particular noted that DT_UNKNOWN
was always a legal return value for d_type
. Bart looked over my shoulder and spied at least one error during the debugging phase.