Really neat paper: "Hash, Don't Cache: fast packet forwarding for enterprise edge routers". The paper and abstract are both available on line. The authors are Minlan Yu, and Jennifer Rexford - both of Princeton.
The authors share a lament of mine - caching forwarding decisions is attractive but no longer realistic as an implementation. The diversity of addresses (including those generated randomly by attackers) pollutes the cache and sends way too much traffic to slow fallback processing paths. So we end up with routers with one class of memory and no caches. Generally their memory is made totally from the really expensive fast stuff.
But the suggestion in this paper of using hashed bloom filters instead of caches is a really cool one. Essentially maintain one filter per interface in fast memory (sram probably) and evaluate those at forwarding time in parallel if you can. You probably just get a single hit and can forward the packet onwards.. you do this with only needing enough fast ram for the hash filters (i.e. not much).. whereas all the traditional trie data and routing update code can live in slow and cheap dram.
of course, due to the false positive nature of bloomies it is possible that you'll get more than one match that is going to require some kind of (probably slower) fallback plan for that packet. But the problem is nowhere near as dire as it is with caches.. with caches the misses force all the valid entries out of the cache and then all traffic is slowed down as the cache rate drops - with bloomies only the packets with the false positives are impacted, almost all of the traffic goes through the fast path unimpacted. The paper puts the false positive rate somewhere on the order of 1 in tens of thousands (depending on a bunch of factors - but that gives a feel for it.) Furthermore a cache can be intentionally attacked with a diversity of addresses in order to flood the cache and impact service for everyone - where under bloom filters such an attack is no more or less likely to impact service than any other kind of packet.
What can't you apply a bloom filter to? It's all pretty cool. The paper has a number of other details on how to minimize false positives and efficiently process route updates.
There is a related work from the same authors on a "buffalo" architecture which, after just a quick glance, appears to apply similar principles to LAN switching.
J Rexford is also an author of Web Protocols and Practice: HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement which is probably the most practical description of the major web protocol I've ever seen.
Real data and musings on the performance of networks, servers, protocols, and their related folks.
Wednesday, February 24, 2010
Tuesday, February 16, 2010
Googling Harder
A while back I mentioned that Google thinks TCP ought to be more aggressive.
I must admit, this matches my own bias. I can barely count the number of applications I have watched wait for network I/O when there was plenty of CPU and idle bandwidth available. It's maddening. Sometimes it's slow start or another aspect of congestion control, sometimes it is outdated things like the nagle algorithm.
Well, Google is back at it with this slide set. (PDF)
They make the argument for increasing the initial cwnd. More provocatively, they argue that the Web has already done so in a defacto way by going to aggressive numbers of independent parallel HTTP connections (where you essentially get new cwnd credits just for opening a new TCP stream). Clever argument. Maybe you want to pace the data after 3 or 4 packets based on the RTT of the handshake - so you don't overrun any buffers un-necessarily.
Frankly, this kind of thing can be implemented on the server side without ever telling the peer. It would make some sense for Google to just do this for a few different values of cwnd on a tiny fraction of their traffic and see if the packet loss rates change and then publish that.
I must admit, this matches my own bias. I can barely count the number of applications I have watched wait for network I/O when there was plenty of CPU and idle bandwidth available. It's maddening. Sometimes it's slow start or another aspect of congestion control, sometimes it is outdated things like the nagle algorithm.
Well, Google is back at it with this slide set. (PDF)
They make the argument for increasing the initial cwnd. More provocatively, they argue that the Web has already done so in a defacto way by going to aggressive numbers of independent parallel HTTP connections (where you essentially get new cwnd credits just for opening a new TCP stream). Clever argument. Maybe you want to pace the data after 3 or 4 packets based on the RTT of the handshake - so you don't overrun any buffers un-necessarily.
Frankly, this kind of thing can be implemented on the server side without ever telling the peer. It would make some sense for Google to just do this for a few different values of cwnd on a tiny fraction of their traffic and see if the packet loss rates change and then publish that.
Saturday, February 13, 2010
Channel 2000
Posts about finding embedded linux in strange places are pretty common these days - but I couldn't resist this one.
Flipping channels tonight put me on channel 2000 of my Time-Warner Digital Cable, and this was the signal:
Forgive the snapshot.
But is that Red Hat 9 (not FC 9), running 2.4.24 ?
Is that my cable box? Its a standard cable company motorola DVR, which is a lot newer than 2.4.24, but it wouldn't be surprising to see MOT basing an embedded project off an old kernel. Of course, the TV is inherently networked - if not necessarily IP networked, and this kind of abandoned legacy code could be entertaining.
Of course, it might not be the cable box at all - it might just be some kind of generating equipment gone bad - and the video signal ended up on channel 2000. Not sure what is supposed to be there - I wasn't aware there were any 4 digit channels on my system.
Flipping channels tonight put me on channel 2000 of my Time-Warner Digital Cable, and this was the signal:
Forgive the snapshot.
But is that Red Hat 9 (not FC 9), running 2.4.24 ?
Is that my cable box? Its a standard cable company motorola DVR, which is a lot newer than 2.4.24, but it wouldn't be surprising to see MOT basing an embedded project off an old kernel. Of course, the TV is inherently networked - if not necessarily IP networked, and this kind of abandoned legacy code could be entertaining.
Of course, it might not be the cable box at all - it might just be some kind of generating equipment gone bad - and the video signal ended up on channel 2000. Not sure what is supposed to be there - I wasn't aware there were any 4 digit channels on my system.