Monday, January 14, 2013

On Induced Latency and Loss

The estimable Mark Allman has a new paper out in the latest ACM CCR dealing with data collection around buffering in the Internet. If you respect a good characterization paper as much as I do - you should read it and a couple related mailing list threads here and here. The paper is only 7 pages; go read it -- this blog post will wait for you.

The paper's title, Comments on Bufferbloat, in my mind undersells the contributions of the paper. Colloquially I consider bufferbloat to refer specifically to deeply filled queues at or below the IP layer that create latency problems for real-time applications using different transport layer flows. (e.g. FTP/Torrent/WebBrowser creating problems for perhaps unrelated VOIP). But the paper provides valuable data on real world levels of induced latency where it has implications for TCP Initial Window (IW) sending sizes and the web-browser centric concern of managing connection parallelism in a world with diverse buffer sizes and IWs. That's why I'm writing about it here.

Servers inducing large latency and loss through parallelized sending is right now a bit of a corner case problem in the web space that seems to be growing. My chrome counterpart Will Chan talks a bit about it too over here.

For what its worth, I'm still looking for some common amounts of network buffering to use as configurations in simulations on the downstream path. The famous Netalyzer paper has some numbers on upstream. Let me know if you can help.

The traffic Mark looked at was not all web traffic (undoubtedly lots of P2P), but I think there are some interesting take aways for the web space. These are my interpretations - don't let me put words in the author's mouth for you when the paper is linked above:

  • There is commonly some induced delay due to buffering.. the data in the paper shows residential rtts of 78ms typically bloated to ~120ms. That's not going to mess with real time too badly, but its an interesting data point.
  • The data set shows bufferbloat existence at the far end of the distribution, At the 99th percentile of round trips on residential connections you see over 900ms of induced latency.
  • The data set shows that bufferbloat is not a constant condition - the amount of induced latency is commonly changing (for better and for worse) and most of the time doesn't fluctuate into dangerous ranges.
  • There are real questions of whether other data sets would show the same thing, and even if they did it isn't clear to me what the acceptable frequency of these blips would be to a realtime app like VOIP or RTC-Web.
  • IW 10 seems reasonably safe from the data in Mark's paper, but all of his characterizations don't necessarily match what we see in the web space. In particular the size of our flows are not as often less than 3 packets in size as they are in the paper's data (which is not all web traffic). There are clearly deployed corner cases on the web where server's send way too much data (typically through sharding) and induce train wrecks on the web transfer. How do we deal with that gracefully in a browser without sharding or IW=10 for hosts that use them in a more coordinated way? That's a big question for me right now.

[ Comments on this are best done at https://plus.google.com/100166083286297802191/posts/6XB59oaQzDL]