Wednesday, December 5, 2012

Smarter Network Page Load for Firefox

I just landed some interesting code for bug 792438. It should be in the December 6th nightly, and If it sticks it will be part of Firefox 20.

The new feature keeps the network clear of media transfers while a page's basic html ,css, and js are loaded. This results in a faster first paint as the elements that block initial rendering are loaded without competition. Total page load is potentially slightly regressed, but the responsiveness more than makes up for it. Similar algorithms report first paint improvements of 30% with pageload regressions around 5%. There are some cases where the effect is a lot more obvious than 30% - try,,, or - all of these sites have large amounts of highly parallel media on the page that, in previous versions, competed with the required css.

Any claim of benefit is going to depend on bandwidth, latency, and the contents of your cache (e.g. if you've already got the js and css cached, this is moot.. or if you have a localhost connection like talos tp5 uses it is also moot because bandwidth and latency are essentially ideal there).

Before I get to the nitty gritty I think its worth a paragraph of Inside-Mozilla-Baseball to mention what a fun project this was. I say that in particular because it involved a lot of cross team participation on many different aspects (questions, advice, data, code in multiple modules, and reviews). I think we can all agree that when many different people are involved on the same effort efficiency is typically the first casualty. Perhaps this project is the exception needed to prove the rule - it went from a poorly understood use case to commited code very quickly. Ehsan, Boris, Dave Mandelin, Josh Aas, Taras, Henri, Honza Bambas,  Joe Drew, and Chromium's Will Chan helped one and all - its not too often you get the rush of everyone rowing in sync; but it happened here and was awesome to behold and good for the web.

In some ways this feature is not intuitive. Web performance is generally improved by adding parallelization due to large amounts of unused bandwidth left over by the single session HTTP/1 model. In this case we are actually reducing parallelization which you wouldn't think would be a good thing. Indeed that is why it can regress total page load time for some use cases. However, parallelization is a double edged sword.

When parallelization works well it is because it helps cover up moments of idle bandwidth in a single HTTP transaction (e.g. latency in the handshake, latency during the header phase, or even latency pauses involved in growing the TCP congestion window) and this is critically important to overall performance. Servers engage in hostname sharding just to opt into dozens of parallel connections for performance reasons.

On the other hand, when it works poorly parallelization kills you in a couple of different ways. The more spectacular failure mode I'll be talking about in a different post (bug 813715) but briefly the over subscription of the link creates induced queues and TCP losses that interact badly with TCP congestion control and recovery. But the issue at hand here is more pedestrian - if you share a connection 50 ways and all 50 sessions are busy then each one gets just 2% of the bandwidth. They are inherently fair with each other and without priority even though that doesn't reflect their importance to your page.

If the required js and css is only getting 10% of the bandwidth while the images are getting 90% then the first meaningful paint is woefully delayed. The reason you do the parallelism at all is because many of those connections will be going through one of the aforementioned idle moments and aren't all simultaneously busy - so its a reasonable strategy as long as maximizing total bandwidth utilization is your goal.. but in the case of an HTML page load some resources are more important than others and it isn't worth sacrificing that ordering to perfectly optimize total page load. So this patch essentially breaks page load into two phases to sort out that problem.

The basic approach here is the same as used by Webkit's ResourceLoadScheduler. So Webkit browsers already do the basic idea (and have validated it). I decided we wanted to do this at the network layer instead of in content or layout to enable a couple extra bonuses that Webkit can't give because it operates on a higher layer:
  1. If we know apriori that a piece of work is highly sensitive to latency but is very low in bandwidth then we can avoid holding it back even if that work is just part of an HTTP transaction. As part of this patchset I have included the ability to preconnect a small quota of 6 media TCP sessions at a time while css/js is loading. More than 6 can complete, its just limited to 6 outstanding at one instant to bound the bandwidth consumption. This results in a nice little hot pool of connections all ready for use by the image transfers when they are cleared for takeoff. You could imagine this being slightly expanded to a small number of parallel HTTP media transfers that were bandwidth throttled in the future.
  2. The decision on holding back can be based on whether or not SPDY is in use if you wait until you have a connection to make the decision - spdy is a prioritized and muxed-on-one-tcp-session protocol that doesn't need to use this kind of workaround to do the right thing. In its case we should just send the requests as soon as possible with appropriate priorities attached to them and let the server do the right thing. The best of both worlds!
 This issue illustrates again how dependent HTTP/1 is on TCP behavior and how that is a shortcoming of the protocol. Reasonable performance demands parallelism, but how much is dependent on the semantics of the content, the conditions of the network, and the size of the content. These things are essentially unknowable and even characterizations of the network and typical content change very quickly. Its essential for the health of the Internet that we migrate away from this model onto HTTP/2.

Comments over here please: