Friday, November 21, 2014

Proxy Connecitons over TLS - Firefox 33

There have been a bunch of interesting developments over the past few months in Mozilla Platform Networking that will be news to some folks. I've been remiss in not noting them here. I'll start with the proxying over TLS feature. It landed as part of Firefox 33, which is the current release.

This feature is from bug 378637 and is sometimes known as HTTPS proxying. I find that naming a bit ambigous - the feature is about connecting to your proxy server over HTTPS but it supports proxying for both http:// and https:// resources (as well as ftp://, ws://, and ws:/// for that matter). https:// transactions are tunneled via end to end TLS through the proxy via the CONNECT method in addition to the connection to the proxy being made over a separate TLS session.. For https:// and wss:// that means you actually have end to end TLS wrapped inside a second TLS connection between the client and the proxy.

There are some obvious and non obvious advantages here - but proxying over TLS is strictly better than traditional plaintext proxying. One obvious reason is that it provides authentication of your proxy choice - if you have defined a proxy then you're placing an extreme amount of trust in that intermediary. Its nice to know via TLS authentication that you're really talking to the right device.

Also, of course the communication between you and the proxy is also kept confidential which is helpful to your privacy with respect to observers of the link between client and proxy though this is not end to end if you're not accessing a https:// resource. Proxying over TLS connections also keep any proxy specific credentials strictly confidential. There is an advantage even when accessing https:// resources through a proxy tunnel - encrypting the client to proxy hop conceals some information (at least for that hop) that https:// normally leaks such as a hostname through SNI and the server IP address.

Somewhat less obviously, HTTPS proxying is a pre-requisite to proxying via SPDY or HTTP/2. These multiplexed protocols are extremely well suited for use in connecting to a proxy because a large fraction (often 100%) of a clients transactions are funneled through the same proxy and therefore only 1 TCP session is required when using a prioritized multiplexing protocol. When using HTTP/1 a large number of connections are required to avoid head of line blocking and it is difficult to meaningfully manage them to reflect prioritization. When connecting to remote proxies (i.e. those with a high latency such as those in the cloud) this becomes an even more important advantage as the handshakes that are avoided are especially slow in that environment.

This multiplexing can really warp the old noodle to think about after a while - especially if you have multiple spdy/h2 sessions tunneled inside a spdy/h2 connection to the proxy. That can result in the top level multiplexing several streams with http:// transactions served by the proxy as well as connect streams to multiple origins that each contain their own end to end spdy sessions carrying multiple https:// transactions.

To utilize HTTPS proxying just return the HTTPS proxy type from your FindProxyForURL() PAC function (instead of the traditional HTTP type). This is compatible with Google's Chrome, which has a similar feature.

function FindProxyForURL(url, host) {
  if (url.substring(0,7) == "http://") {
   return "HTTPS proxy.mydomain.net:443;"
  }
  return "DIRECT;"
}


Squid supports HTTP/1 HTTPS proxying. Spdy proxying can be done via Ilya's node.js based spdy-proxy. nghttp can be used for building HTTP/2 proxying solutions (H2 is not yet enabled by default on firefox release channels - see about:config network.http.spdy.enabled.http2 and network.http.spdy.enabled.http2draft to enable some version of it early). There are no doubt other proxies with appropriate support too.

If you need to add a TOFU exception for use of your proxy it cannot be done in proxy mode. Disable proxying, connect to the proxy host and port directly from the location bar and add the exception. Then enable proxying and the certificate exception will be honored. Obviously, your authentication guarantee will be better if you use a normal WebPKI validated certificate.

Tuesday, March 4, 2014

On the Application of STRINT to HTTP/2

I participated for two days last week in the joint W3C/IETF (IAB) workshop on Strengthening the Internet against Pervasive Monitoring (aka STRINT). Now that the IETF has declared pervasive monitoring of the Internet to be a technical attack the goal of the workshop was to establish the next steps to take in reaction to the problem. There were ~100 members of the Internet engineering and policy communities participating - HTTP/2 standardization is an important test case to see if we're serious about following through.

I'm pleased that we were able to come to some rough conclusions and actions. First a word of caution: there is no official report yet, I'm certainly not the workshop secretary, this post only reflects transport security which was a subset of the areas discussed, but I still promise I'm being faithful in reporting the events as I experienced them.

Internet protocols need to make better use of communications security and more encryption - even imperfect unauthenticated crypto  is better than trivially snoopable cleartext.  It isn't perfect, but it raises the bar for the attacker. New protocols designs should use strongly authenticated mechanisms falling back to weaker measures only as absolutely necessary, and updates to older protocols should be expected to add encryption potentially with disabling switches if compatibility strictly requires it. A logical outcome of that discussion is the addition of these properties (probably by reference, not directly through replacement) to BCP 72 - which provides guidance for writing RFC security considerations.

At a bare minimum, I am acutely concerned with making sure HTTP/2 brings more encryption to the Web. There are certainly many exposures beyond the transport (data storage, data aggregation, federated services, etc..) but in 2014 transport level encryption is a well understood and easily achievable technique that should be as ubiquitously available as clean water and public infrastructure. In the face of known attacks it is a best engineering practice and we shouldn't accept less while still demanding stronger privacy protections too. When you step back from the details and ask yourself if it is really reasonable that a human's interaction with the Web is observable to many silent and undetectable observers the current situation really seems absurd.

The immediate offered solution space is complicated and incomplete. Potential mitigations are fraught with tradeoffs and unintended consequences. The focus here is on what happens to http:// schemed traffic,  https is comparably well taken care of. The common solution offered in this space carries http:// over an unauthenticated TLS channel for HTTP/2. The result is a very simple plug and play TLS capable HTTP server that is not dependent on the PKI. This provides protection against passive eaves droppers, but not against active attacks. The cost of attacking is raised in terms of CPU, monetary cost, political implications, and risk of being discovered. In my opinion, that's a win. Encryption simply becomes the new equivalent of clear text - it doesn't promote http:// to https://, it does not produce a lock icon, and it does not grant you any new guarantees that cleartext http:// would not have. I support that approach.

The IETF HTTPbis working group will test this commitment to encryption on Wednesday at the London #IETF89 meeting when http:// schemed URIs over TLS is on the agenda (again). In the past, it not been able to garner consensus. If the group is unable to form consensus around a stronger privacy approach than was done with HTTP/1.1's use of cleartext I would hope the IESG would block the proposed RFC during last call for having insufficiently addressed the security implications of HTTP/2 on the Internet as we now know it.

#ietf89 #strint

Saturday, August 31, 2013

SSL Everywhere for HTTP/2 - A New Hope

Recently the IETF working group on HTTP met in Berlin, Germany and discussed the concept of mandatory to offer TLS for HTTP/2, offered by Mark Nottingham.  The current approach to transport security means only 1/5 of web transactions are given the protections of TLS.  Currently all of the choices are made by the content owner via the scheme of the url in the markup.

It is time that the Internet infrastructure simply gave users a secure by default transport environment - that doesn't seem like a radical statement to me. From FireSheep to the Google Sniff-Wifi-While-You-Map Car to PRISM there is ample evidence to suggest that secure transports are just necessary table stakes in today's Internet.

This movement in the IETF working group is welcome news and I'm going to do everything I can to help iron out the corner cases and build a robust solution.

My point of view for Firefox has always been that we would only implement HTTP/2 over TLS. That means https:// but it has been my hope to find a way to use it for http:// schemes on servers that supported it too. This is just transport level security - for web level security the different schemes still represent different origins. If cleartext needed to be used it would be done with HTTP/1 and someday in the distant future HTTP/1 would be put out to pasture. This roughly matches Google Chrome's public stance on the issue.

Mandatory to offer does not ban the practice of cleartext - if the client did not want TLS a  compliant cleartext channel could be established. This might be appropriate inside a data center for instance - but Firefox would be unlikely to do so.

This approach also does not ban intermediaries completely. https:// uris of course remain end to end and can only be tunneled (as is the case in HTTP/1), but http:// uris could be proxied via appropriately configured clients by having the HTTP/2 stream terminated on the proxy. It would prevent "transparent proxies" which are fundamentally man in the middle attacks anyhow.

Comments over here: https://plus.google.com/100166083286297802191/posts/XVwhcvTyh1R

Tuesday, August 13, 2013

Go Read "Reducing Web Latency - the Virtue of Gentle Aggression"

One of the more interesting networking papers I've come across lately is Reducing Web Latency: the Virtue of Gentle Aggression. It details how poorly TCP's packet loss recovery techniques map onto HTTP and proposes a few techniques (some more backwards compatible than others) for improving things. Its a large collaborative effort from Google and USC. Go read it.

The most important piece of information is that TCP flows that experienced a loss took 5 times longer to complete than flows that did not experience a loss. 10% of flows fell into the "some loss" bucket. For good analysis go read the paper, but the shorthand headline is that this is because the short "mice" flows of HTTP/1 tend to be comprised mostly of tail packets and traditionally tail loss is only recovered using horrible timeout based strategies.

This is the best summary of the problem that I've seen - but its been understood for quite a while. It is one of the motivators of HTTP/2 and is a theme underlying several of the blog posts google has made about QUIC.

The "aggression" concept used in the paper is essentially the inclusion of redundant data in the TCP flow under some circumstances. This can, at the cost of extra bandwidth, bulk up mice flows so that their tail losses are more likely to be recovered with single RTT mechanisms such as SACK based fast recovery instead of timeouts. Conceivably this extra data could also be treated as an error correction coding which would allow some losses to be recovered independent of the RTT.

Saturday, August 3, 2013

Internet Society Briefing Panel @ IETF 87

This past week saw IETF 87, in Berlin Germany, come and go. As usual I met a number of interesting new folks, caught up with old acquaintances I respect (a number of whom now work for Google - it seems to happen when you're not looking at them :)), and got some new ideas into my head (or refined old ones).

On Tuesday I was invited to be part of a panel for the ISOC lunch with Stuart Cheshire and Jason Livingood. We were lead and moderated by the imitable Leslie Daigle.

The panel's topic was "Improving the Internet Experience, All Together Now." I made the case that the standards community needs to think a little less about layers and a little more about the end goal - and by doing so we can provide more efficient building blocks. Otherwise developers find them selves tempted to give up essential properties like security and congestion control in order to build more responsive applications. But we don't need to make that tradeoff.

The audio is available here http://www.internetsociety.org/internet-society-briefing-panel-ietf-87 .. unfortunately the audio is full of intolerable squelch until just after the 22:00 mark (I'm the one speaking at that point).. the good news is most of that is "hold music", introductions, and a recap of the panel topic that is also described on the web page.

Thursday, June 6, 2013

RTT - a lot more than time to first byte

Lots of interesting stuff in this paper: Understanding the latency benefits of multi-cloud webservice deployments.

One of the side angles I find interesting is that they find that even if you built an idealized fully replicated custom CDN using multiple datacenters on multiple clouds there are still huge swaths of Internet population that have browser to server RTTs over 100ms. (China, India, Argentina, Israel!)

RTT is the hidden driver in so much of internet performance and to whatever extent possible it needs to be quashed. Its so much more than time to first byte.

Bandwidth, CPU, even our ability to milk more data out of radio spectrum, all improve much more rapidly than RTT can.

Making RTT better is really a physical problem and is, in the general case at least, really really hard. Making things less reliant on RTT is a software problem and is a more normal kind of hard. Or at least it seems if you're a software developer.

RTT is not just time to the first byte of a web request. It's DNS. Its the TCP handshake. Its the SSL handshake. Its the time it takes to do a CORS check before an XHR. Its the time it takes to grow the congestion window (i.e. how fast you speed up from slow start) or confirm revocation status of the SSL certificate. Its the time it takes to do an HTTP redirect. And it is the time you're stalled while you recover from a packet loss. They're all some factor of RTT.

HTTP/2 and SPDY take the first step of dealing with this by creating a prioritized and muxxed protocol which tends to make it more often bandwidth bound than HTTP/1 is. But even that protocol still has lots of hidden RTT scaling algorithms in it.


Tuesday, May 28, 2013

If a tree falls in a forest and nobody is there to hear it..

Robert Koch's Masters Thesis about websockets security and adoption is linked.

One interesting tidbit - he surveyed 15K free apps from the google play store and saw only 14 that use websockets. Previous surveys of traditional websites have seen similarly poor uptake.

WS certainly isn't optimal, but it has a few pretty interesting properties and its sad to see it underused. I suspect its reliance on a different server and infrastructure stack is part of the reason.


Thursday, March 7, 2013

Firefox Nightly on WebPageTest

Thanks to Google's amazing Pat Meenan, Firefox nightly has been added to webpagetest.org - just make sure to select the Dulles VA location to be able to use it.

He shared the news on twitter. Thanks Pat!

WPT is a gold standard tool and to be able to see it reflect (and validate/repudiate) changes in nightly is really terrific addition.




Monday, January 14, 2013

On Induced Latency and Loss

The estimable Mark Allman has a new paper out in the latest ACM CCR dealing with data collection around buffering in the Internet. If you respect a good characterization paper as much as I do - you should read it and a couple related mailing list threads here and here. The paper is only 7 pages; go read it -- this blog post will wait for you.

The paper's title, Comments on Bufferbloat, in my mind undersells the contributions of the paper. Colloquially I consider bufferbloat to refer specifically to deeply filled queues at or below the IP layer that create latency problems for real-time applications using different transport layer flows. (e.g. FTP/Torrent/WebBrowser creating problems for perhaps unrelated VOIP). But the paper provides valuable data on real world levels of induced latency where it has implications for TCP Initial Window (IW) sending sizes and the web-browser centric concern of managing connection parallelism in a world with diverse buffer sizes and IWs. That's why I'm writing about it here.

Servers inducing large latency and loss through parallelized sending is right now a bit of a corner case problem in the web space that seems to be growing. My chrome counterpart Will Chan talks a bit about it too over here.

For what its worth, I'm still looking for some common amounts of network buffering to use as configurations in simulations on the downstream path. The famous Netalyzer paper has some numbers on upstream. Let me know if you can help.

The traffic Mark looked at was not all web traffic (undoubtedly lots of P2P), but I think there are some interesting take aways for the web space. These are my interpretations - don't let me put words in the author's mouth for you when the paper is linked above:

  • There is commonly some induced delay due to buffering.. the data in the paper shows residential rtts of 78ms typically bloated to ~120ms. That's not going to mess with real time too badly, but its an interesting data point.
  • The data set shows bufferbloat existence at the far end of the distribution, At the 99th percentile of round trips on residential connections you see over 900ms of induced latency.
  • The data set shows that bufferbloat is not a constant condition - the amount of induced latency is commonly changing (for better and for worse) and most of the time doesn't fluctuate into dangerous ranges.
  • There are real questions of whether other data sets would show the same thing, and even if they did it isn't clear to me what the acceptable frequency of these blips would be to a realtime app like VOIP or RTC-Web.
  • IW 10 seems reasonably safe from the data in Mark's paper, but all of his characterizations don't necessarily match what we see in the web space. In particular the size of our flows are not as often less than 3 packets in size as they are in the paper's data (which is not all web traffic). There are clearly deployed corner cases on the web where server's send way too much data (typically through sharding) and induce train wrecks on the web transfer. How do we deal with that gracefully in a browser without sharding or IW=10 for hosts that use them in a more coordinated way? That's a big question for me right now.

[ Comments on this are best done at https://plus.google.com/100166083286297802191/posts/6XB59oaQzDL]

Saturday, December 8, 2012

Managing Bandwidth Priorities In Userspace via TCP RWIN Manipulation

This post is a mixture of research and speculation. The speculative thinking is about how to manage priorities among a large number of parallel connections. The research is a survey of how different OS versions handle attempts by userspace to actively manage the SO_RCVBUF resource.

Let's consider the case where a receiver needs to manage multiple incoming TCP streams and the aggregate sending windows exceed the link capacity. How do they share the link?

There are lots of corner cases but it basically falls into 2 categories. Longer term flows try and balance themselves more or less evenly with each other, and new flows (or flows that have been quiescent for a while) blindly inject data into the link based on their Initial Window configuration (3, 10, or even 40 [yes, 40] packets).

This scenario is a pretty common one for a web browser on a site that does lots of sharding.. It is common these days to see 75, 100, or even 200 parallel connections used to download a multi megabyte page. Those different connections have an internal priority to the web browser, but that information is more or less lost when HTTP and TCP take over - the streams compete with each other blind to the facts of what they carry or how large that content is. If you have 150 new streams sending IW=10 that means 1500 packets can be sent more or less simultaneously even if it is of a low priority. That kind of data is going to dominate most links and cause losses on others. Of course, unknown to the browser, those streams might only have 1 packet worth of data to send (small icons, or 304's) - it would be a shame to dismiss parallelism in those cases out of fear of sending too fast. Capping parallelism at a lower number is a crude approach that will hurt many use cases and is very hard to tune in any event.

The receiver does have one knob to influence the relative weights of the incoming streams: the TCP Receive Window (rwin). The sender generally determines the rate a stream can transfer at (via the CWND calculation), but that rate is limited to no more than rwin allows.

I'm sure you see where this is going - if the browser wants to squelch a particular stream (because the others are more important) it can reduce the rwin of the stream to below the Bandwidth Delay Product of the link - effectively slowing just that one stream. Viola - crude priorities for the HTTP/TCP connections! (RTT is going to matter a lot at this point for how fast they run - but if this is just a problem about sharding then RTTs should generally be similar amongst the flows so you can at least get their relative priorities straight).

setsockopt(SO_RCVBUF) is normally how userspace manipulates the buffers associated with the receive window. I set out to survey common OS platforms to see how they handled dynamic tuning of that parameter in order to manipulate the receiver window.

WINDOWS 7
Win7 does the best job; it allows SO_RCVBUF to both dynamically increase and decrease rwin (Decreases require the WsaReceiveBuffering socket ioctl ). Increasing a window is instantaneous and window update is generated for the peer right away. However when decreasing a window (and this is true of all platforms that allow decreasing) the window is not slammed shut - it naturally degrades with data flow and is simply not recharged as the application consumes the data which results in the smaller window.

For instance a stream has an rwin of 64KB and decreases it to 2KB. A window update is not sent on the wire even if the connection is quiescent.. Only after the 62KB of data has been sent the window shrinks naturally to 2KB even if the application has consumed the data from the kernel - future reads will recharge rwin back to a max of 2KB. But this process is not accelerated in any way by the decrease of SO_RCVBUF no matter how long it takes. Of course there would always be 1RTT of time between the decrease of SO_RCVBUF and the time it takes effect (where the sender could send larger amounts of data) but by not "slamming" the value down with a window update (which I'm not even certain would be tcp compliant) that period of time is extended from 1RTT to indefinite. Indeed, the application doesn't need SO_RCVBUF at all to achieve this limited definition of decreasing the window - it can simply stop consuming data from the kernel (or it can pretend to do so by using MSG_PEEK) and that would be no more or less effective.

If we're trying to manage a lot of parallel busy streams this strategy for decreasing the window will work ok - but if we're trying to protect against a new/quiescent stream suddenly injecting a large amount of data it isn't very helpful. And that's really the use case I have in mind.

The other thing to note about windows 7 is that the window scale option (which effectively controls how large the window can get and is not adjustable after the handshake) is set to the smallest possible value for the SO_RCVBUF set before the connect. If we agree that starting with a large window on a squelched stream is problematic because decreases don't take effect quickly enough, that implies the squelched stream needs to start with a small window. Small windows will not need window scaling. This isn't a problem for the initial squelched state - but if we want to free the stream up to run at a higher speed (perhaps because it now has the highest relative priority of active streams after some of the old high priority ones completed) the maximum rwin is going to be 64KB - going higher than that requires window scaling. a 64KB window can support significant data transfer (e.g. 5 megabit/sec at 100ms rtt) but certainly doesn't cover all the use cases of today's Internet.

WINDOWS VISTA
When experimenting with Vista I found behavior similar to Windows 7. The only difference I noted was that it always used a window scale in the handshake even if initially a rwin < 64KB was going to be used. This allows connections with small initial windows to be raised to greater sizes than is possible on windows 7 - which for the purposes of this scheme is a point in vista's favor.

I speculate that the change was made in windows 7 to increase interoperability - window scale is sometimes an interop problem with old nats and firewalls and the OS clearly doesn't anticipate the rwin range to be actively manipulated while the connection is established.. therefore if window scale isn't needed for the initial window size you might as well omit it out of an abundance of caution.

WINDOWS XP
By default XP does not do window scaling at all - limiting us to 64KB windows and therefore multiple connections are required to use all the bandwidth found in many homes these days. It doesn't allow shrinking rwin at all (but as we saw above the vista/win-7 approach to shrinking the window isn't more useful than one that can be emulated through non-SO_RCVBUF approaches).

XP does allow raising the rwin. So a squelched connection could be setup with a low initial window and then raised up to 64KB when its relative ranking improved. The sticky wicket here is that it appears that attempts to set SO_RCVFBUF below 16KB don't work. 16KB maps to a new connection with IW=10 - having a large set of new squlched connections all with the capacity to send 10 packets probably doesn't meet the threshold of squelched.

OS X (10.5)
The latest version of OS X I have handy is dated. Nothing in a google search leads me to believe this has changed since 10.5, but I'm happy to take updates.

OS X, like XP, does not allow decreasing a window through SO_RCVBUF.

It does allow increasing one if the window size was set before the connection - otherwise "auto tuning" is used and cannot be manipulated while the connection is established.

Like vista, the initial window determines the scaling factor, and assuming a small window on a squelched stream that means window scaling is disabled and the window can only be increased to 64KB for the life of the connection.

LINUX/ANDROID
Linux can decrease rwin, but as with other platforms requires data transfer to do it instead of a 1-RTT slamming shut of the window. Linux does not allow increasing the window past the size it has when the connection is established. So you can start a window large, slowly decrease it, and then increase it back to where it started.. but you can't start a window small and then increase it as you might want to do with a squelched stream.

It pains me to say this, and it is so rarely true, but this makes Linux the least attractive development platform for this approach.

CONCLUSIONS
From this data it seems viable to attempt a strategy for {windows >= vista and OS X} that mixes 3 types of connections:
  1. Full Autotuned connections. These are unsquelched, can never be slowed down, generally use window scaling and are capable of running at high speeds
  2.  Connections that begin with small windows and are currently squelched to limit the impact of new low priority streams in highly parallel environments
  3.  Connections that were previously squelched but have now been upgraded to 64KB windows.. "almost full" (1) if you will.
  4.  

Wednesday, December 5, 2012

Smarter Network Page Load for Firefox

I just landed some interesting code for bug 792438. It should be in the December 6th nightly, and If it sticks it will be part of Firefox 20.

The new feature keeps the network clear of media transfers while a page's basic html ,css, and js are loaded. This results in a faster first paint as the elements that block initial rendering are loaded without competition. Total page load is potentially slightly regressed, but the responsiveness more than makes up for it. Similar algorithms report first paint improvements of 30% with pageload regressions around 5%. There are some cases where the effect is a lot more obvious than 30% - try pinterest.com, 163.com, www.skyrock.com, or www.tmz.com - all of these sites have large amounts of highly parallel media on the page that, in previous versions, competed with the required css.

Any claim of benefit is going to depend on bandwidth, latency, and the contents of your cache (e.g. if you've already got the js and css cached, this is moot.. or if you have a localhost connection like talos tp5 uses it is also moot because bandwidth and latency are essentially ideal there).

Before I get to the nitty gritty I think its worth a paragraph of Inside-Mozilla-Baseball to mention what a fun project this was. I say that in particular because it involved a lot of cross team participation on many different aspects (questions, advice, data, code in multiple modules, and reviews). I think we can all agree that when many different people are involved on the same effort efficiency is typically the first casualty. Perhaps this project is the exception needed to prove the rule - it went from a poorly understood use case to commited code very quickly. Ehsan, Boris, Dave Mandelin, Josh Aas, Taras, Henri, Honza Bambas,  Joe Drew, and Chromium's Will Chan helped one and all - its not too often you get the rush of everyone rowing in sync; but it happened here and was awesome to behold and good for the web.

In some ways this feature is not intuitive. Web performance is generally improved by adding parallelization due to large amounts of unused bandwidth left over by the single session HTTP/1 model. In this case we are actually reducing parallelization which you wouldn't think would be a good thing. Indeed that is why it can regress total page load time for some use cases. However, parallelization is a double edged sword.

When parallelization works well it is because it helps cover up moments of idle bandwidth in a single HTTP transaction (e.g. latency in the handshake, latency during the header phase, or even latency pauses involved in growing the TCP congestion window) and this is critically important to overall performance. Servers engage in hostname sharding just to opt into dozens of parallel connections for performance reasons.

On the other hand, when it works poorly parallelization kills you in a couple of different ways. The more spectacular failure mode I'll be talking about in a different post (bug 813715) but briefly the over subscription of the link creates induced queues and TCP losses that interact badly with TCP congestion control and recovery. But the issue at hand here is more pedestrian - if you share a connection 50 ways and all 50 sessions are busy then each one gets just 2% of the bandwidth. They are inherently fair with each other and without priority even though that doesn't reflect their importance to your page.

If the required js and css is only getting 10% of the bandwidth while the images are getting 90% then the first meaningful paint is woefully delayed. The reason you do the parallelism at all is because many of those connections will be going through one of the aforementioned idle moments and aren't all simultaneously busy - so its a reasonable strategy as long as maximizing total bandwidth utilization is your goal.. but in the case of an HTML page load some resources are more important than others and it isn't worth sacrificing that ordering to perfectly optimize total page load. So this patch essentially breaks page load into two phases to sort out that problem.

The basic approach here is the same as used by Webkit's ResourceLoadScheduler. So Webkit browsers already do the basic idea (and have validated it). I decided we wanted to do this at the network layer instead of in content or layout to enable a couple extra bonuses that Webkit can't give because it operates on a higher layer:
  1. If we know apriori that a piece of work is highly sensitive to latency but is very low in bandwidth then we can avoid holding it back even if that work is just part of an HTTP transaction. As part of this patchset I have included the ability to preconnect a small quota of 6 media TCP sessions at a time while css/js is loading. More than 6 can complete, its just limited to 6 outstanding at one instant to bound the bandwidth consumption. This results in a nice little hot pool of connections all ready for use by the image transfers when they are cleared for takeoff. You could imagine this being slightly expanded to a small number of parallel HTTP media transfers that were bandwidth throttled in the future.
  2. The decision on holding back can be based on whether or not SPDY is in use if you wait until you have a connection to make the decision - spdy is a prioritized and muxed-on-one-tcp-session protocol that doesn't need to use this kind of workaround to do the right thing. In its case we should just send the requests as soon as possible with appropriate priorities attached to them and let the server do the right thing. The best of both worlds!
 This issue illustrates again how dependent HTTP/1 is on TCP behavior and how that is a shortcoming of the protocol. Reasonable performance demands parallelism, but how much is dependent on the semantics of the content, the conditions of the network, and the size of the content. These things are essentially unknowable and even characterizations of the network and typical content change very quickly. Its essential for the health of the Internet that we migrate away from this model onto HTTP/2.

Comments over here please: https://plus.google.com/100166083286297802191

Wednesday, November 14, 2012

A brief note on pipelines for firefox

A brief note on http pipeline status - you may have read on mozilla planet yesterday that Chrome has this technology enabled. That's not true, I can't say why the author wrote it - but chrome has the same status (implemented but configured off by default) as firefox and largely for the same reasons.

This is a painful note for me to write, because I'm a fan of HTTP pipelines.

But, despite some obvious wins with pipelining it remains disabled as a high risk / high maintenance item. I use it and test with it every day with success, but much of the risk in tied up in a user's specific topology - intermediaries (including virus checkers - an oft ignored but very common intermediary) are generally the problem with 100% interop. I encourage readers of planet moz to test it too - but that's a world apart turning it on by default- the test matrix of topologies is just very different.

See bugzilla 716840 and 791545 for some current problems caused by unknown forces.

Also see telemetry for TRANSACTION_WAIT_TIME_HTTP and TRANSACTON_WAIT_TIME_HTTP_PIPELINES - you'll see that pipelines do marginally reduce queuing time, but not by a heck of a lot in practice. (~65% of transactions are sent within 50ms using straight HTTP, ~75% with pipelining enabled). If we don't see more potential than that, it isn't worth taking the pain of interop troubles. It is certainly possible that those numbers can be improved.

The real road forward though is HTTP/2 - a fully multiplexed and pipelined protocol without all the caveats and interop problems. Its better to invest in that. Check out TRANSACTON_WAIT_TIME_SPDY and you'll see that 93% of all transactions wait less than 1ms in the queue!