Bits Up!

Monday, November 24, 2008

Dallas One-Wire Temperature Network - Followup

I managed to get the little 5-volt 1 wire network I mentioned earlier, built up and running. Even fishing the wires wasn't too hard, "thanks" to the distinct lack of insulation in some of my walls. I did have to patch a section of the main run (I used a 100 ft run with short stubs to hold the sensors) when I put a staple through the cable while attaching it to the rafters. Doh!

The graphs make it look colder inside than it really is as I purposely put the sensors in all the cold spots. The kitchen has a zoned radiator that we can use if we are hanging out there, and the wild swings in my office are just the result of me closing the door when I'm not in there. The dining room is the warmest and I will eventually add a sensor there to put an upper bound on the data.

I didn't really like any of the pre-canned software options for it, so I rolled my own using xmgrace, digitemp, rsync, and cron. This is pretty crude, but it is a decent placeholder.

Monday, November 10, 2008

One-Wire Home Temperature Network with Linux - a prototype

So, I live in Maine. The locals say, quite correctly, that its the way life oughta be. The way we keep massive hordes away from our paradise is to make it kinda chilly 6 months of the year. The fact that I live in a circa 1830 farmhouse is a constant reminder of that.

Last winter I froze a kitchen water pipe in the basement. It was due to a gap in the foundation. I patched up the hole and insulated the pipe. (Oddly I inherited 90% of the water pipes insulated, but not this stretch.) The night it froze was not the coldest of the year (it was probably 20 above the coldest mark), but there was a strong wind and no snow on the ground so the cold air just swept through the gap onto the pipe. The rest of the winter passed without further incident.

I put a thermometer in the vicinity, but frankly the basement is kinda dark and creepy and you don't go venturing into the back in the winter without a reason. So the thermometer wasn't all that useful. The sensible thing to do would be to put a $20 wireless thermometer (e.g. AcuRite Digital Wireless Weather Thermometer Indoor Outdoor) down there with the readout in the kitchen.

But why do that, when we could go for geek overkill?

The wireless thermometer has some downsides: the receiver is clutter for something rarely used, I still have to remember to look at it, it has no log and the most interesting data is when I am sleeping, it only measures one place per piece of clutter, it lacks an alarm facility, and it needs batteries that inevitably will die in situ.

Clearly this is a job for a $500 computer instead, right?

It appears this is generally done with a "one-wire" network. 1-wire is a dallas semi standard for simple devices that can be powered and read with one wire. You daisy chain them together, typically using one wire from a piece of cat-5. You really need 2 wires, the second is for ground.

This is really neat stuff. There are sensors for temp, humidity, pressure, and even ground water for your garden. A bunch of places sell this stuff in kits, where the sensors come in a little box with an rj-45 "in" and an "out" if you want to chain another senor off of it. Or you can just buy the sensors for a lot less and solder the leads onto the ground and data wires wherever you want a reading. Each one has a serial number it reports as part of the 1-wire protocol so you can tell them apart, and its pretty easy to auto explore the chain and then power each sensor individually to get a reading (don't read them simultaneously).

The temperature gadgets, completely assembled wirth rj-45 connectors and little cases can run $25 each. But the sensors themselves are just $4 or less in single quantities. (Buy them by the thousand and you can get them for a buck.) I bought just the sensors.

In addition to the sensors, you also need a driver circuit to drive the power, do the polling, etc. There are several schematics for building them, but I gave into my software engineering side and bought a prebuilt one wire usb interface, for $28.

My primary interest is in the basement, but as long as I'm stringing a cable along the rafters I might as well measure a few different points. So I'll grab my office, the basement, the dining room, and the kitchen. I might stick the final sensor outside and cover it in shrinkwrap so there is a "control" number to compare the others to. We are heating this season with a pellet stove instead of central heat, so this information ought to help determine the effectiveness of the various fan placements I'm considering.

Once everything is in place the data can be captured a myriad of ways. The most common are the one wire filesystem, or by using digitemp. (google bait: if digitemp returns CRC errors, clear the configuration file it saves - this cost me hours of resoldering connections that were just fine.) After that normal linux graphing software can go to town.

I need to order a couple more sensors to layout the final network - I wanted to build a prototype first. It was easy enough.

To do it you'll need: an RJ-11 crimping tool (not rj-45 for ethernet), an rj-11 end. Some cat-5 or cat-3 that is long enough to run your network, and a soldering tool.

The way I wired it, only the middle two wires matter for the crimp. I have blue/white on the left and blue on the right as viewed with the clip down and the contacts away from you. We're going to use white for the signal and blue for the ground. Attach one of the sensors to the end of the cable. The signal pin is the middle one, and the ground pin is on the left. (Left is defined with the flat side of the sensor facing up and the leads pointed down. The right lead is not used, you can trim it off (I've just bent it out of place in this picture).

With the end sensor in place, you can add as many more as you like along the line by just stripping the insulation in place and soldering the leads right onto the cable wherever they need to go. This 1-wire stuff is extremely forgiving of my attempts to pretend that I know what I'm doing with electronics hardware.

Obviously you need to tape up or shrink wrap all the joints, I left them open on the prototype for photos. With this in place, digitemp -a -r 800 happily reports two different sensors within half a degree of each other. Huzzah!

Now its off to grab the sensors I need for the real thing, installing the cable in the basement along with the other probe points, and getting a graph and alarm server going on the IP network. Such fun!

Saturday, November 8, 2008

DNS Prefetching for Firefox

Recently I implemented patches to implement DNS prefetching in Firefox. I am primarily interested in their impact on Fennec (aka Mobile Firefox), but it looks like they will land first in Firefox 3.1 beta2. The, hopefully final, glitches are being shaken out of the patch now.

Google Chrome has a feature like this too.

DNS resolutions are always dominated by latency instead of bandwidth. Particularly on mobile networks the latencies are very high. That makes them perfect candidates for speculative pre-fetching. The advantage is in the latency improvement - instead of waiting for a hostname lookup when you click a link, do that lookup while you're reading the page the link is embedded in. Because the lookups are so small (generally one runt packet in and out) the cost of any wasted over optimistic lookups really doesn't impact the performance of browsing. Good payoff at low cost, the best of both worlds.

The basic benefit is simple: if you click on a link using a new hostname, you save a round trip time. On some networks this can be a substantial improvement (800ms or more) in responsiveness. Some describe this simply as "figuring out the IP address of every link before you click on it".

The Firefox implementation takes this approach one step further than just pre-resolving anchor href hostnames. It uses the prefetch logic on URLs that are being included in the current document. By this I mean that it uses the prefetch logic on things like images, css, and jscript that are being loaded right away, in addition to anchor links which might be clicked on at a slightly later time.

At first that seems non-sensical. How can you pre-fetch the DNS for something you are fetching right now? Where does the "pre" come in? The answer is not so much in the definition of "pre" as it is in the definition of "right now". Most HTTP User Agents, Firefox being no exception, limit their number of simultaneous connections and hosts. Typical pages embed quite a few objects and it is easy to run into these limits. When this happens the browser queues some of the requests. The Firefox pre-fetch DNS implementation allows those queued requests to overlap the high latency host resolution with whatever transfers might be going on without creating an excess level of parallelization.

While this is just a secondary benefit, it can be meaningful. For example, on the day I grabbed a snapshot of http://planet.mozilla.org/ it required 23 unique DNS resolutions in order to render the base page. Most of these were in img URLs. When loading the page with the prefetch patches, even with a cold cache, 16 of them were either fully completed when needed for the first connection or at least already in progress. The result, measured on an EDGE network, was a 4% overall improvement in page load time. Not bad for something that does not reduce bandwidth consumption in any way.

Configuration

Basically, it just works. You don't need to do anything. But there are a few configurables out there for both browsers and content providers.

First, as a browser you might want no part of this. Fair enough - its your browser. If you set the preference network.dns.disablePrefetch to true the prefetch code will never take effect, no matter what any other configuration is set to.

Furthermore, as a security measure, prefetching of embedded link hostnames is not done from documents loaded over https. If you want to allow it in that context too, just set the preference network.dns.disablePrefetchFromHTTPS to true.

Content providers have a couple neat tricks available too. These are meant to be compatible with Chrome.

For content to opt out of DNS pre-resolution it can be served with the x-dns-prefetch-control HTTP header set to off. The equivalent meta http-equiv element can be used instead of a response header too:
<meta http-equiv="x-dns-prefetch-control" content="off">

Setting content to on will reverse the effect. You can never turn pre-fetching on in a browser that has it disabled by preference, but you can undo the impact of a previous x-dns-prefetch control command. In this way, different content provider policies can apply to different portions of the document.

The last configuraton possibility allows the content provider to force the lookup of a particular hostname without providing an anchor using that name. This is done with the link tag:
<link rel="dns-prefetch" href="http://www.spreadfirefox.com/">

The href attribute can contain a full URL, or just a hostname. Hostname only attributes should preceed the hostname with two slashes:
<link rel="dns-prefetch" href="//www.spreadfirefox.com">

Content providers might use the link notation in a site-wide home page in order to preload hostnames that are widely used throughout the site but perhaps not on the home page.

Sunday, September 28, 2008

Asynchronous DNS lookups on Linux with getaddrinfo_a()

A little while back I posted about a bug in the glibc getaddrinfo() implementation which resulted in many CNAME lookups having to be repeated. At that time I teased a future post on the topic of the little known getaddrinfo_a() function - here it is.

type "man getaddrinfo_a". I expect you will get nothing. That was the case for me. Linux is full of non-portable, under-documented, but very powerful interfaces and tools. The upside of these tools is great - I recently referred to netem and ifb which can be used for easy network shaping, and interfaces like tee(), splice() and epoll() are also hugely powerful, but woefully underutilized. I always get a thrill when I stumble across one of these.

Part of the reason for their low profile is portability. And there are times when that matters - though I think it is cited as a bedrock principle more than is really necessary. I think the larger reason is that some of these techniques lack the documentation, web pages, and references in programmer pop-culture necessary to be ubiquitously useful.

Maybe this post will help getaddrinfo_a find its mojo.

This little jewel is a standard part of libc, and has been for many years - you can be assured that it will be present in the runtime of any distribution of the last several years.

getaddrinfo_a() is an asynchronous interface to the DNS resolution routine - getaddrinfo(). Instead of sitting there blocked while getaddrinfo() does its thing, control is returned to your code immediately and your code is interrupted at a later time with the result when it is complete.

Most folks will realize that this is a common need when dealing with DNS resolution. It is a high latency operation and when processing log files, etc, you often have a need to do a lot of them at a time. The asynchronous interface lets you do them in parallel - other than the waiting-for-the-network time, there is very little CPU or even bandwidth overhead involved in a DNS lookup. As such, it is a perfect thing to do in parallel. You really do get linear scaling.

The best documentation seems to be in the design document from Ulrich Drepper. This closely reflects the reality of what was implemented. Adam Langley also has an excellent blog post with an illustration on how to use it. Actually, the header files are more or less enough info too, if you know that getaddrinfo_a() even exists in the first place.

The good news about the API is that you can submit addresses in big batches with one call.

The bad news about the API is that it offers callback either via POSIX signal handling, or by spawning a new thread and running a caller supplied function on it. My attitude is generally to avoid making signal handling a core part of any application, so that's right out. Having libraries spawn threads is also a little disconcerting, but the fact that that mechanism is used here for the callback is really minor compared to how many threads getaddrinfo_a() spawns internally.

I had assumed that the invocation thread would send a few dns packets out onto the wire and then spawn a single thread listening for and multiplexing the responses.. or maybe the listening thread would send out the requests as well and then multiplex the responses. But reading the code shows it actually creates a pretty sizable thread pool wherein each thread calls and blocks on getaddrinfo().

This is more or less the technique most folks roll together by hand, and it works ok - so it is certainly nice to have predone and ubiquitously available in libc rather than rolling it by hand. And it is ridiculous to code it yourself when you are already linking to a library that does it that way. But it seems to have some room for improvement internally in the future.. if that happens, its nice to know that at least the API for it is settled and upgrades should be seamless.

One last giant caveat - in libc 2.7 on 64 bit builds, getaddrinfo_a() appears to overflow the stack and crash immediately on just about any input. This is because the thread spawned internally is created with a 16KB stack which is not enough to initialize the name resolver when using 64 bit data types. Oy! The fix is easy, but be aware that some users may bump into this until fixed libcs are deployed.

Tuesday, September 23, 2008

Simulating Wireless Networks With Linux

I have been working on enhancing network performance for the upcoming Firefox on Mobile. I jokingly refer to this work as "finding useful things to do with latency" as the browser is targeted at cell phone networks. These networks have latencies of hundreds, sometimes over a thousand, milliseconds.

From time to time I hope to talk on this blog about interesting things I have found or done while looking into this.

One of the cool things I consed up in the effort is a python script to emulate one of these networks over localhost. Just run the script, along with an XML file that describes the network you're looking to simulate, and then you can run any networking application you want across localhost to measure the impact of any potential changes you want to make.

The script relies on netem and ifb. In that sense, it doesn't really add anything fundamental by itself. Those are outstanding, but poorly understood tools.

By rolling that together in a script, and providing XML profiles for 3G, edge, bluetooth, evdo, hspd, and gprs wireless networks I was able to provide a meaningful testbed for evaluating default preferences for concurrency and pipeline depth, as well as the impact of changes to DNS pre-fetching and the pipelining implementation. All good stuff. Some of them need their own posts.

If you're interested in the tool - this is the release announcement. It is bundled as part of my local copy of the firefox development tree, but the tool is easily separable from that for use on something else.

Thursday, September 11, 2008

The minutia of getaddrinfo() and 64 bits

I have been spending some time recently improving the network behavior of Firefox in mobile (i.e. really high latency, sort of low bandwidth) environments.

The manifestations du jour of that are some improvements to the mozilla DNS system.

In that pursuit, I was staring at some packet traces of the DNS traffic generated from my 64 bit linux build (using my LAN instead of the slow wireless net) and I saw this gem:

16:13:56.748926 IP 192.168.16.214.35935 > 192.168.16.218.53: 3172+ A? bitsup.blogspot.com. (37)<br />16:13:56.749239 IP 192.168.16.218.53 > 192.168.16.214.35935: 3172 2/0/0 CNAME[|domain]<br />16:13:56.749388 IP 192.168.16.214.47514 > 192.168.16.218.53: 40044+ A? bitsup.blogspot.com. (37)<br />16:13:56.749542 IP 192.168.16.218.53 > 192.168.16.214.47514: 40044 2/0/0 CNAME[|domain]<br />

That is the same request (and response) duplicated. Doing it an extra time appears to cost all of .3ms on my LAN, but on a cell phone that could delay resolution (and therefore page load) time by a full second - very noticeable lag.

I started by combing through the firefox DNS code looking for the bug I assumed I had accidentially put in the caching layer. But I confirmed there was just one call to libc's getaddrinfo() being made for that name.

Then I figured it was some kind of large truncated DNS record from blogspot which necessitated a refetch. Looking further into it the record was really quite normal. The response was just 127 bytes and exactly the same each time - it contained 2 response records: one A record and one CNAME record. Both had reasonably sized names.

I found the same pattern with another CNAME too: 2 out of 6.

And so the debugging began in earnest. Cliff Stoll was looking for his 75 cents, and I was out to find my extra round trip time.

I did not find an international conspiracy, but after whipping together a debuggable libc build I determined that the DNS answer parser placed a "struct host" and some scratch space for parsing into a buffer passed into it from lower on the stack. If the answer parser couldn't parse the response in that space an "ERANGE" error was returned and the caller would increase the buffer size and try again. But the try again involved the whole network dance again instead of just the parsing.

So what I was seeing was that the original buffer of 512 bytes was too small, but a second try with 1024 worked fine. Fair enough, it just seems like an undersized default to fail at such a common case.

And then it made sense. For most of its life, it hasn't been undersized - while the DNS response hasn't change the "struct host" did when I went to 64 bit libraries. struct host is comprised of 48 pointers and a 16 byte buffer. On a 32 bit arch that's 208 bytes, but with 8 byte pointers it is 400. With a 512 byte ceiling, 400 is a lot to give up.

64 bit has a variety of advantages and disadvantages, but an extra RTT was a silent-penalty I hadn't seen before.

This patch fixes things up nicely.

This is good concrete opportunity to praise the pragmatism of developing on open source. It is not about the bug (everyone has them, if this even is one - it is borderline), it is about the transparency. If this was a closed OS, the few avenues available to me would have been incredibly onerous and possibly expensive. Instead I was able to resolve it in an afternoon by myself (and mail off the patch to the outstanding glibc development team).

At some future time, I'll have a similarly thrilling story about the little known but widely deployed getaddrinfo_a().

Friday, May 30, 2008

Firefox Add-Ons - webfolder webDAV add-on updated for Firefox 3

I have had reason to do a little server side WebDAV work these days.

WebDav clients for Linux aren't all that common. There is a filesystem gateway (davfs), and cadaver. Cadaver is a decent command line app; like ncftp.

And more recently, I was introduced to webfolders. This is a addon for firefox that does a perfectly good job of creating a file manager interface for DAV sites.

Point one of this post: webfolders is cool - download it yourself.

Point two of this post: webfolders as listed on the website does not support firefox 3. And of course you are using firefox 3.

Point three of this post: you really ought to be using firefox 3 - it is much faster than firefox 2.

Point four of this post: I want it all - so I have done the trivial work of updating webfolders for firefox 3. Just changed a few constants and paths, and life is good. It is here for download (any OS!)

Friday, April 18, 2008

Measuring performance of Linux Kernel likely() and unlikely()

A little while back I wrote about how prominent likely() and unlikely() are in the Linux kernel, and yet I could not find any performance measurements linked to them.

Today I made some measurements myself.

But first a quick review - likely and unlikely are just macros for gcc's __builtin_expect(), which in turn allows the compiler to generate code compatible with the target architecture's branch prediction scheme. The GCC documentation really warns against using this manually too often:

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The kernel certainly makes liberal use of it. Accroding to LXR 2.6.24 had 1608 uses of likely and 2075 uses of unlikely in the code. LXR didn't have an index of the just released 2.6.25 yet - but I'd bet it is likely to be more now.

My methodology was simple, I choose several benchmarks commonly used in kernel land and I ran them against vanilla 2.6.25 and also against a copy I called "notlikely" which simply had the macros nullified using this piece of programming genius:

<br />diff --git a/include/linux/compiler.h b/include/linux/compiler.h<br />index dcae0c8..f08b535 1006<br />--- a/include/linux/compiler.h<br />+++ b/include/linux/compiler.h<br />@@ -57,8 +57,14 @@ extern void __chk_io_ptr(const volatile void __iomem *);<br />* specific implementations come from the above header files<br />*/<br /><br />+#if 0<br />#define likely(x)      __builtin_expect(!!(x), 1)<br />#define unlikely(x)    __builtin_expect(!!(x), 0)<br />+#else<br />+#define likely(x)      (x)<br />+#define unlikely(x)    (x)<br />+#endif<br />+<br /><br />* Optimization barrier */<br />#ifndef barrier<br />

The tests I ran were lmbench, netperf, bonnie++, and the famous "how fast can I compile the kernel?" test.

The test hardware was an all 64 bit setup on a 2.6Ghz core-2 duo with 2GB of ram and a SATA disk. Pretty standard desktop hardware.

The core 2 architecture has a pretty fine internal branch prediction engine without the help of these external hints. But with such extensive use of the macros (3500+ times!), I expected to see some difference shown by the numbers.

But I didn't see any measurable difference. Not at all.

Not a single one of those tests showed anything that I wouldn't consider overlapping noise. I had 3 data points for each test on each kernel (6 points per test) and each test had several different facets. Out of dozens of different facets, there wasn't a single criteria where the measurement was always better or worse on one kernel.

And this disappoints me. Because I like micro optimizations damn it! And in general this one seems to be a waste of time other than the nice self documenting code it produces. Perhaps the gcc advice is correct. Perhaps the Core-2 is so good that this doesn't matter. Perhaps there is a really compelling benchmark that I'm just not running.

I say it is a waste in general because I am sure there are specific circumstances and code paths where this makes a measurable difference. There certainly must be a benchmark that can show it - but none of these broad based benchmarks were able to show anything useful. That doesn't mean the macro is over used, it seems harmless enough too, but it probably isn't worth thinking too hard about it either.

hmm.

Monday, April 14, 2008

Monitoring IP changes with NETLINK ROUTE sockets

Yesterday I read a mailing list query asking how to get event driven Linux IP address changes (i.e. without having to poll for them).

I agreed with the attitude of the post. The most important thing about scaling is to make sure the work your code does is proportional to the real event stream. Seems obvious enough, but lots of algorithms screw up that basic premise.

Any time based polling algorithm betrays this scaling philosophy because work is done every tick independent of the events to be processed. You're always either doing unnecessary work or adding latency to real work by waiting for the next tick. The select() and poll() APIs also betray it as these are proportional to the amount of potential work (number of file descriptors) instead of the amount of real work (number of active descriptors) - epoll() is a better answer there.

Event driven is the way to go.

Anyhow, back to the original poster. I knew netlink route sockets could do this - and I had used them in the past for similar purposes. I had to get "man 7 netlink"and google going to cobble together an example and only then did I realize how difficult it is to get started with netlink - it just has not been very widely used and documented.

So the point of this post is to provide a little google juice documentation for event driven monitoring of new IPv4 addresses using netlink. At least this post has a full example - the code is below.

If you need to use this functionality - I recommend man 3 and 7 of both netlink and rtnetlink.. and then go read the included header files, and use my sample as a guide. In this basic way you can get address adds, removals, link state changes, route changes, interface changes, etc.. lots of good stuff. It is at the heart of the iproute tools (ip, ss, etc..) as well most of the userspace routing software (zebra, xorp, vyatta, etc..).

<br />/* Copyright mcmanus@ducksong.com 2009, Under terms of GPLv2 */<br /><br />#include <stdio.h><br />#include <string.h><br />#include <netinet/in.h><br />#include <linux/netlink.h><br />#include <linux/rtnetlik.h><br />#include <net/if.h><br /><br /><br />int main()<br />{<br /> struct sockaddr_nl addr;<br /> int nls,len,rtl;<br /> char buffer[4096];<br /> struct nlmsghdr *nlh;<br /> struct ifaddrmsg *ifa;<br /> struct rtattr *rth;<br /><br /> if ((nls = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE)) == -1)   perror ("socket failure\n");<br /><br /> memset (&addr,0,sizeof(addr));<br /> addr.nl_family = AF_NETLINK;<br /> addr.nl_groups = RTMGRP_IPV4_IFADDR;<br /><br /> if (bind(nls, (struct sockaddr *)&addr, sizeof(addr)) == -1)    perror ("bind failure\n");<br /><br /> nlh = (struct nlmsghdr *)buffer;<br /> while ((len = recv (nls,nlh,4096,0)) > 0)<br /> {<br />     for (;(NLMSG_OK (nlh, len)) && (nlh->nlmsg_type != NLMSG_DONE); nlh = NLMSG_NEXT(nlh, len))<br />     {<br />         if (nlh->nlmsg_type != RTM_NEWADDR) continue; /* some other kind of announcement */<br /><br />         ifa = (struct ifaddrmsg *) NLMSG_DATA (nlh);<br /><br />         rth = IFA_RTA (ifa);<br />         rtl = IFA_PAYLOAD (nlh);<br />         for (;rtl && RTA_OK (rth, rtl); rth = RTA_NEXT (rth,rtl))<br />         {<br />             char name[IFNAMSIZ];<br />             uint32_t ipaddr;<br /><br />             if (rth->rta_type != IFA_LOCAL) continue;<br /><br />             ipaddr = * ((uint32_t *)RTA_DATA(rth));<br />             ipaddr = htonl(ipaddr);<br /><br />             fprintf (stdout,"%s is now %X\n",if_indextoname(ifa->ifa_index,name),ipaddr);<br />         }<br />     }<br /> }<br />}<br /><br />

Tuesday, April 8, 2008

IP Georeferencing

IP Georeferencing is a pretty cool toolbox item on today's web. Simply put, its the process of taking an IP address and converting it to some geographical (city, lat, long, whatever) information.

It gets commonly used for security filters, log analysis, ad targeting, community building, etc..

Maxmind deserves a shout out for making available databases to do this. There is nothing inherent about an IP number that gives you this information, so they just ned to build out of band databases to get the job done. They have two copies of the databases, one for free and one a bit more accurate that is priced very reasonably.

The databases come with a number of different libraries for using them: C, Java, PHP, etc.. The libraries are released under the LGPL.

Recently I was doing a project that needed to lookup scads and scads of addresses, so I put a little muscle into improving the lookup routines in the C code.

I'm happy to say I was able to improve things anywhere from 2x to 8x in terms of overall lookups, depending on what exactly was being lookedup. There were a bunch of changes, but the primary one was the addition of a variable length radix lookup mechanism that changed the average number of comparisons from 27 to 4 - not rocket science, but the right tool for the job.

I'm even more happy to say I sent those back as LGPL contributions of my own. The code is on my open source contribution web page.

Saturday, April 5, 2008

Linux Selective Acknowledgment (SACK) CPU Overhead

Last year I tossed an e-mail from the Linux kernel networking list in my "projtodo" folder.

The mail talked about how the Linux TCP stack in particular, and likely all TCP stacks in general, likely had an excessive-CPU attack exposure when confronted with malicious SACK options. I found the mail intriguing but unsatisfying. It was well informed speculation but didn't have any hard data, nor was there any simple way to gather some. Readers of the other posts on this blog will know I really dig measurements. The issue at hand was pretty obviously is a problem - but how much of one?

A few weeks ago I had the chance to develop some testing code and find out for myself - and IBM DeveloperWorks has published the summary of my little project. The executive summary is "its kinda bad, but not a disaster, and hope is on the way". There is had data and some pretty pictures in the article itself.

The coolest part of the whole endeavor, other than scratching the "I wonder" itch, was getting to conjure up a userspace TCP stack from raw sockets. It was, of course, woefully incomplete as it was just meant to trigger a certain behavior in its peer instead of being generally useful or reliable - but nonetheless entertaining.

Thursday, April 3, 2008

Linux Kernel - likely() not measured?

The other day on kernelnewbies, the able Robert Day wondered whether or not anyone had quantified the effects of the likely() and unlikely() macros scattered all over the Linux kernel.

He received no less than 5 replies telling him how the macros worked. (If you are curious, this is the best explanation) - but not a single piece of measurement or other characterization.

LXR shows over 3500 uses of those macros in the kernel, and nobody has any data for any scenario? Wowzers.

Doing before/after benchmarks with those macros changed to nops would be an interesting project. Could use the usual suspects of linux kernel performance enhancements to test (lmbench, compile test, some kind of network load generator, etc..)

Comments with pointers to data would be cool.