Sunday, September 28, 2008

Asynchronous DNS lookups on Linux with getaddrinfo_a()

A little while back I posted about a bug in the glibc getaddrinfo() implementation which resulted in many CNAME lookups having to be repeated. At that time I teased a future post on the topic of the little known getaddrinfo_a() function - here it is.

type "man getaddrinfo_a". I expect you will get nothing. That was the case for me. Linux is full of non-portable, under-documented, but very powerful interfaces and tools. The upside of these tools is great - I recently referred to netem and ifb which can be used for easy network shaping, and interfaces like tee(), splice() and epoll() are also hugely powerful, but woefully underutilized. I always get a thrill when I stumble across one of these.

Part of the reason for their low profile is portability. And there are times when that matters - though I think it is cited as a bedrock principle more than is really necessary. I think the larger reason is that some of these techniques lack the documentation, web pages, and references in programmer pop-culture necessary to be ubiquitously useful.

Maybe this post will help getaddrinfo_a find its mojo.

This little jewel is a standard part of libc, and has been for many years - you can be assured that it will be present in the runtime of any distribution of the last several years.

getaddrinfo_a() is an asynchronous interface to the DNS resolution routine - getaddrinfo(). Instead of sitting there blocked while getaddrinfo() does its thing, control is returned to your code immediately and your code is interrupted at a later time with the result when it is complete.

Most folks will realize that this is a common need when dealing with DNS resolution. It is a high latency operation and when processing log files, etc, you often have a need to do a lot of them at a time. The asynchronous interface lets you do them in parallel - other than the waiting-for-the-network time, there is very little CPU or even bandwidth overhead involved in a DNS lookup. As such, it is a perfect thing to do in parallel. You really do get linear scaling.

The best documentation seems to be in the design document from Ulrich Drepper. This closely reflects the reality of what was implemented. Adam Langley also has an excellent blog post with an illustration on how to use it. Actually, the header files are more or less enough info too, if you know that getaddrinfo_a() even exists in the first place.

The good news about the API is that you can submit addresses in big batches with one call.

The bad news about the API is that it offers callback either via POSIX signal handling, or by spawning a new thread and running a caller supplied function on it. My attitude is generally to avoid making signal handling a core part of any application, so that's right out. Having libraries spawn threads is also a little disconcerting, but the fact that that mechanism is used here for the callback is really minor compared to how many threads getaddrinfo_a() spawns internally.

I had assumed that the invocation thread would send a few dns packets out onto the wire and then spawn a single thread listening for and multiplexing the responses.. or maybe the listening thread would send out the requests as well and then multiplex the responses. But reading the code shows it actually creates a pretty sizable thread pool wherein each thread calls and blocks on getaddrinfo().

This is more or less the technique most folks roll together by hand, and it works ok - so it is certainly nice to have predone and ubiquitously available in libc rather than rolling it by hand. And it is ridiculous to code it yourself when you are already linking to a library that does it that way. But it seems to have some room for improvement internally in the future.. if that happens, its nice to know that at least the API for it is settled and upgrades should be seamless.

One last giant caveat - in libc 2.7 on 64 bit builds, getaddrinfo_a() appears to overflow the stack and crash immediately on just about any input. This is because the thread spawned internally is created with a 16KB stack which is not enough to initialize the name resolver when using 64 bit data types. Oy! The fix is easy, but be aware that some users may bump into this until fixed libcs are deployed.

Tuesday, September 23, 2008

Simulating Wireless Networks With Linux

I have been working on enhancing network performance for the upcoming Firefox on Mobile. I jokingly refer to this work as "finding useful things to do with latency" as the browser is targeted at cell phone networks. These networks have latencies of hundreds, sometimes over a thousand, milliseconds.

From time to time I hope to talk on this blog about interesting things I have found or done while looking into this.

One of the cool things I consed up in the effort is a python script to emulate one of these networks over localhost. Just run the script, along with an XML file that describes the network you're looking to simulate, and then you can run any networking application you want across localhost to measure the impact of any potential changes you want to make.

The script relies on netem and ifb. In that sense, it doesn't really add anything fundamental by itself. Those are outstanding, but poorly understood tools.

By rolling that together in a script, and providing XML profiles for 3G, edge, bluetooth, evdo, hspd, and gprs wireless networks I was able to provide a meaningful testbed for evaluating default preferences for concurrency and pipeline depth, as well as the impact of changes to DNS pre-fetching and the pipelining implementation. All good stuff. Some of them need their own posts.

If you're interested in the tool - this is the release announcement. It is bundled as part of my local copy of the firefox development tree, but the tool is easily separable from that for use on something else.

Thursday, September 11, 2008

The minutia of getaddrinfo() and 64 bits

I have been spending some time recently improving the network behavior of Firefox in mobile (i.e. really high latency, sort of low bandwidth) environments.

The manifestations du jour of that are some improvements to the mozilla DNS system.

In that pursuit, I was staring at some packet traces of the DNS traffic generated from my 64 bit linux build (using my LAN instead of the slow wireless net) and I saw this gem:

That is the same request (and response) duplicated. Doing it an extra time appears to cost all of .3ms on my LAN, but on a cell phone that could delay resolution (and therefore page load) time by a full second - very noticeable lag.

I started by combing through the firefox DNS code looking for the bug I assumed I had accidentially put in the caching layer. But I confirmed there was just one call to libc's getaddrinfo() being made for that name.

Then I figured it was some kind of large truncated DNS record from blogspot which necessitated a refetch. Looking further into it the record was really quite normal. The response was just 127 bytes and exactly the same each time - it contained 2 response records: one A record and one CNAME record. Both had reasonably sized names.

I found the same pattern with another CNAME too: 2 out of 6.

And so the debugging began in earnest. Cliff Stoll was looking for his 75 cents, and I was out to find my extra round trip time.

I did not find an international conspiracy, but after whipping together a debuggable libc build I determined that the DNS answer parser placed a "struct host" and some scratch space for parsing into a buffer passed into it from lower on the stack. If the answer parser couldn't parse the response in that space an "ERANGE" error was returned and the caller would increase the buffer size and try again. But the try again involved the whole network dance again instead of just the parsing.

So what I was seeing was that the original buffer of 512 bytes was too small, but a second try with 1024 worked fine. Fair enough, it just seems like an undersized default to fail at such a common case.

And then it made sense. For most of its life, it hasn't been undersized - while the DNS response hasn't change the "struct host" did when I went to 64 bit libraries. struct host is comprised of 48 pointers and a 16 byte buffer. On a 32 bit arch that's 208 bytes, but with 8 byte pointers it is 400. With a 512 byte ceiling, 400 is a lot to give up.

64 bit has a variety of advantages and disadvantages, but an extra RTT was a silent-penalty I hadn't seen before.

This patch fixes things up nicely.

This is good concrete opportunity to praise the pragmatism of developing on open source. It is not about the bug (everyone has them, if this even is one - it is borderline), it is about the transparency. If this was a closed OS, the few avenues available to me would have been incredibly onerous and possibly expensive. Instead I was able to resolve it in an afternoon by myself (and mail off the patch to the outstanding glibc development team).

At some future time, I'll have a similarly thrilling story about the little known but widely deployed getaddrinfo_a().