Caching minidns
I’ve started using analog
to build reports from the blueslugs.com server logs. One of the tools on the site is minidns
, which is a small Perl script that runs through the logs, replacing any IPv4 address matches with their DNS-resolved names (if defined). I’ve improved this slightly, by adding caching inside the application (on the grounds that your site is likely visited by the same communities over time) and, via Storable, to a state file (on the grounds that you probably run the analyzer over your logs rather regularly). [Get cachedns.pl
.]
Is it worth it? Read on.
Here are some simple timed runs with the original (minidns.pl
) and with cachedns.pl
.
$ time perl minidns.pl < ./al.1000 > /dev/null real 1m56.431s user 0m0.030s sys 0m0.030s
al.1000
is the first 1000 lines of an Apache server log.
Our name service cache, nscd
(1), and the DNS server we’re calling (and the perl
(1) text) are now reasonably warmed up for subsequent callers.
$ time perl minidns.pl < ./al.1000 > /dev/null real 0m11.210s user 0m0.030s sys 0m0.020s $ time perl minidns.pl < ./al.1000 > /dev/null real 0m12.943s user 0m0.050s sys 0m0.010s
So 1000 calls take a little over 10 seconds. Let’s run the caching version:
$ time perl cachedns.pl -c dns.cache < ./al.1000 > /dev/null real 0m8.579s user 0m0.090s sys 0m0.020s
So the internal caching is maybe making a little difference. But let’s rerun with the now-populated cache file.
$ time perl cachedns.pl -c dns.cache < ./al.1000 > /dev/null real 0m0.096s user 0m0.070s sys 0m0.010s
Since the first part of the log file is processed every night, our cache file means that we’re likely only going to perform a DNS lookup for new visitors to the site. (There are many sophisticated DNS resolvers-for-weblogs around, that use C++ or Python or threading or whatever. I just felt that a simple, understandable Perl version, with a boost, was enough for this little site.)