Showing posts with label spdy. Show all posts
Showing posts with label spdy. Show all posts

Wednesday, December 5, 2012

Smarter Network Page Load for Firefox

I just landed some interesting code for bug 792438. It should be in the December 6th nightly, and If it sticks it will be part of Firefox 20.

The new feature keeps the network clear of media transfers while a page's basic html ,css, and js are loaded. This results in a faster first paint as the elements that block initial rendering are loaded without competition. Total page load is potentially slightly regressed, but the responsiveness more than makes up for it. Similar algorithms report first paint improvements of 30% with pageload regressions around 5%. There are some cases where the effect is a lot more obvious than 30% - try,,, or - all of these sites have large amounts of highly parallel media on the page that, in previous versions, competed with the required css.

Any claim of benefit is going to depend on bandwidth, latency, and the contents of your cache (e.g. if you've already got the js and css cached, this is moot.. or if you have a localhost connection like talos tp5 uses it is also moot because bandwidth and latency are essentially ideal there).

Before I get to the nitty gritty I think its worth a paragraph of Inside-Mozilla-Baseball to mention what a fun project this was. I say that in particular because it involved a lot of cross team participation on many different aspects (questions, advice, data, code in multiple modules, and reviews). I think we can all agree that when many different people are involved on the same effort efficiency is typically the first casualty. Perhaps this project is the exception needed to prove the rule - it went from a poorly understood use case to commited code very quickly. Ehsan, Boris, Dave Mandelin, Josh Aas, Taras, Henri, Honza Bambas,  Joe Drew, and Chromium's Will Chan helped one and all - its not too often you get the rush of everyone rowing in sync; but it happened here and was awesome to behold and good for the web.

In some ways this feature is not intuitive. Web performance is generally improved by adding parallelization due to large amounts of unused bandwidth left over by the single session HTTP/1 model. In this case we are actually reducing parallelization which you wouldn't think would be a good thing. Indeed that is why it can regress total page load time for some use cases. However, parallelization is a double edged sword.

When parallelization works well it is because it helps cover up moments of idle bandwidth in a single HTTP transaction (e.g. latency in the handshake, latency during the header phase, or even latency pauses involved in growing the TCP congestion window) and this is critically important to overall performance. Servers engage in hostname sharding just to opt into dozens of parallel connections for performance reasons.

On the other hand, when it works poorly parallelization kills you in a couple of different ways. The more spectacular failure mode I'll be talking about in a different post (bug 813715) but briefly the over subscription of the link creates induced queues and TCP losses that interact badly with TCP congestion control and recovery. But the issue at hand here is more pedestrian - if you share a connection 50 ways and all 50 sessions are busy then each one gets just 2% of the bandwidth. They are inherently fair with each other and without priority even though that doesn't reflect their importance to your page.

If the required js and css is only getting 10% of the bandwidth while the images are getting 90% then the first meaningful paint is woefully delayed. The reason you do the parallelism at all is because many of those connections will be going through one of the aforementioned idle moments and aren't all simultaneously busy - so its a reasonable strategy as long as maximizing total bandwidth utilization is your goal.. but in the case of an HTML page load some resources are more important than others and it isn't worth sacrificing that ordering to perfectly optimize total page load. So this patch essentially breaks page load into two phases to sort out that problem.

The basic approach here is the same as used by Webkit's ResourceLoadScheduler. So Webkit browsers already do the basic idea (and have validated it). I decided we wanted to do this at the network layer instead of in content or layout to enable a couple extra bonuses that Webkit can't give because it operates on a higher layer:
  1. If we know apriori that a piece of work is highly sensitive to latency but is very low in bandwidth then we can avoid holding it back even if that work is just part of an HTTP transaction. As part of this patchset I have included the ability to preconnect a small quota of 6 media TCP sessions at a time while css/js is loading. More than 6 can complete, its just limited to 6 outstanding at one instant to bound the bandwidth consumption. This results in a nice little hot pool of connections all ready for use by the image transfers when they are cleared for takeoff. You could imagine this being slightly expanded to a small number of parallel HTTP media transfers that were bandwidth throttled in the future.
  2. The decision on holding back can be based on whether or not SPDY is in use if you wait until you have a connection to make the decision - spdy is a prioritized and muxed-on-one-tcp-session protocol that doesn't need to use this kind of workaround to do the right thing. In its case we should just send the requests as soon as possible with appropriate priorities attached to them and let the server do the right thing. The best of both worlds!
 This issue illustrates again how dependent HTTP/1 is on TCP behavior and how that is a shortcoming of the protocol. Reasonable performance demands parallelism, but how much is dependent on the semantics of the content, the conditions of the network, and the size of the content. These things are essentially unknowable and even characterizations of the network and typical content change very quickly. Its essential for the health of the Internet that we migrate away from this model onto HTTP/2.

Comments over here please:

Monday, August 6, 2012

The Road to HTTP/2

I have been working on HTTP/2.0 standardization efforts in the IETF. Lots has been going on lately, lots of things are still in flux, and many things haven't even been started yet. I thought an update for Planet Mozilla might be a useful complement to what you will read about it in other parts of the Internet. The bar for changing a deeply entrenched architecture like HTTP/1 is very tall - but the time has come to make that change.

Motivation For Change - More than page load time

HTTP has been a longtime abuser of TCP and the Internet. It uses mountains of different TCP sessions that are often only a few packets long. This triggers lots of overhead and results in common stalling due to bad interaction with the way TCP was envisioned to be deployed. The classic TCP model pits very large FTP flows against keystrokes of a telnet session - HTTP doesn't map to either of those very well. The situation is so bad that over 2/3rds of all TCP packet losses are repaired via the slowest possible mechanism (timer expiration), and more than 1 in 20 transactions experience a loss event. That's a sign that TCP interactions are the bottleneck in web scalability.

Indeed, after a certain modest amount of bandwidth is used additional bandwidth barely moves the needle on HTTP performance at all - the only thing that matters is connection latency. Latency isn't getting better - if anything it is getting worse with the transition to more and more mobile networks. This is the quagmire we are in - we can keep adding more bandwidth, clearer radios, and faster processors in everyone's pocket as we've been doing but that won't help much any more.

These problems have all been understood for almost 20 years and many parties have tried to address them over time. However, "good enough" has generally carried the day due to legitimate concerns over backward compatibility and the presence of other lower hanging fruit. We've been in the era of transport protocol stagnation for a while now. Only recently has the problem been severe enough to see real movement on a large scale in this space with the deployment of SPDY by, Chrome, Firefox, Twitter, and F5 among others. Facebook has indicated they will be joining the party soon as well along with nginx. Many smaller sites also participate in the effort, and Opera has another browser implementation available for preview.

SPDY uses a set of time proven techniques. Those are primarily multiplexed transactions on the same socket, some compression, prioritization, and good integration with TLS. The results are strong: page load times are improved, TCP interaction improves (including less connections to manage less dependency on rtt as a scaling factor, and improved "buffer bloat" properties),  and incentives are created to give users the more secure browsing experience they deserve.

 Next Steps

Based on that experience, the IETF HTTP-bis working group has reached consensus to start working on HTTP/2.0 based on the SPDY work of Mike Belshe and Roberto Peon. Mike and Roberto initially did this work at Google, Mike has since left Google and is now the founder of Twist. While the standards effort will be SPDY based, that does not mean it will be backwards compatible (it almost certainly won't be) nor does it mean it will have the exact same feature set but to be a success the basic properties will persevere and we'll end up with a faster, more scalable, and more secure Web.

Effectively taking this work into an open standardization forum means ceding control of the protocol revision process to the IETF and agreeing to implement the final result as long as it doesn't get screwed up too badly. That is a best effort process and we'll just have to participate and then wait to see what becomes of it. I am optimistic - all the right players are in the room and want to do the right things. Just as we have implemented SPDY as an experiment in the past in order to get data to inform its evolution, you can expect Mozilla to vigorously fight for the people of the web to have the best protocol possible and it seems likely we will experiment with implementations of some Internet-draft level revisions of HTTP/2.0 along the way to validate those ideas before a full standard is reached. (We'll also be deprecating all of these interim versions along the way as they are replaced with newer ideas - we don't want to create defacto protocols by mistake.) The period of stagnant protocols is ending.

Servers should come along too - use (and update) mod_spdy and nginx with spdy support; or get hosting from an integrated spdy services provider like CloudFlare or Akamai.

Exciting times.

Thursday, March 8, 2012

Twitter, SPDY, and Firefox

Well - look at what this morning brings: some enabled SPDY goodness in my firefox nightly! (below)

spdy is currently enabled in firefox nightly and you can turn it on in FF 11 and FF 12 through network.http.spdy.enabled in about:config

There is not yet any spdy support for the images on the Akamai CDN that twitter uses, and that's obviously a big part of performance. But still real deployed users of this are Twitter, Google Web, Firefox, Chrome, Silk, node etc.. this really has momentum because it solves the right problems.

Big pieces still left are a popular CDN, open standardization, a http<>spdy gateway like nginx, a stable big standalone server like apache, and support in a load balancing appliance like F5 or citrix. And the wind is blowing the right way on all of those things. This is happening very fast.

GET /account/available_features HTTP/1.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120308 Firefox/13.0a1
X-Requested-With: XMLHttpRequest

HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
Content-Length: 3929
Content-Type: text/javascript; charset=utf-8
Date: Thu, 08 Mar 2012 15:20:56 GMT
Etag: "8f2ef94f3149553a2c68e98a8df04425"
Expires: Tue, 31 Mar 1981 05:00:00 GMT
Last-Modified: Thu, 08 Mar 2012 15:20:56 GMT
Pragma: no-cache
Server: tfe
x-revision: DEV
x-runtime: 0.01763
x-xss-protection: 1; mode=block
X-Firefox-Spdy: 1

Monday, January 23, 2012

HTTP-WG Proposal to tackle HTTP/2.0

Huzzah to Mark Nottingham, chair of the IETF HTTP Working Group. He proposes rechartering the group to "specify (sic) HTTP/2.0 an improved binding of HTTP's semantics to the underlying transport."

That's welcome news - the scalability, efficiency, and robustness issues with HTTP/1 are severe problems that deserve attention in an open standards body forum. The HTTP-WG is the right place.

SPDY will certainly be offered as an input to that process and in my opinion it touches on the right answers. But whatever the outcome it is great to see momentum around open standardization of solutions to the transport problems HTTP/1 suffers from.

Saturday, January 7, 2012

Using SPDY for more responsive interfaces

RST_STREAM turns out to be a feature of spdy that I under appreciated for a long time. The concept is simple enough - either end of the connection can cancel an individual stream without impacting the other streams that are multiplexed on the same connection.

This fills a conceptual gap left by HTTP/1.x. - In HTTP when you want to cancel a transaction about all you can do is close the connection.

Consider the case of quickly clicking through a webmail mailbox - just scanning the contents and rapidly clicking 'next'. Typically the pages will be only partly loaded before you move on to the next one. Assuming you have used up all your parallelism in HTTP/1, the new click will either have to wait for the old transactions to complete (wasting time and bandwidth) or cancel the old ones by closing those connections and then open fresh connections for the new requests. New connections add 2 or 3 round trip times to reopen the SSL connection (you are reading email over SSL, right?) before they can be used to send the new requests. Either way - that is not a good experience.

An interactive map application has similar attributes - as you scan along the map, zooming in and out, lots of tiles are loaded and are often irrelevant before they are drawn. I'm sure you can think of other scenarios that have cancellations.

Spdy solves this simply - with its inherently much greater parallelism the new requests can be sent immediately and at the same time cancel notifications can go out for the old ones. That saves the bandwidth and gets the new requests going as fast as possible without interfering with either the established connection or any other transactions also in progress.

A page load time metric won't really show this to you but the increased responsiveness is very obvious when working with these kinds of use cases - especially under high latency conditions that make connection establishment slower.

Sunday, January 1, 2012

A use case for SPDY header compression

A use case for SPDY header compression:

380 bytes of gzipped javascript (550 uncompressed), sent with 8.8KB of request cookies and 5.5KB of response cookies. That overhead is bad enough to mess with TCP CWND defaults - which means you are taking multiple round trips on the network just to move half a KB of js. For HTTP, that's a performance killer! Those cookies are repeated almost identically on every transaction with that host.

SPDY's dedicated header contexts and the repetitive nature of cookies means those cookies can be reduced ~98% for all but the first transaction of the session. Essentially the cookies remain stateless for app developers, along with the nice properties of that, but the transport leverages the connection state to move them much more efficiently.

Thursday, December 8, 2011

SPDY, Bufferbloat, HTTP, and Real-Time Networking

Long router queue sizes on the web continue to be a hot networking topic - Jim Gettys has a long interview in ACM queue. Large unmanaged queues destroy low latency applications - just ask Randell Jessup

A paper like this does a good job of showing just how bad the situation can be - experimentally driving router buffering delay from 10ms to ~1000ms on many common broadband cable and DSL modems. I wish the paper had been able to show me the range and frequency of that queue delay under normal conditions.

I'm concerned  that decreasing the router buffer size, thereby increasing the drop rate, is detrimental to the current HTTP/1.x web. A classic HTTP/1.x flow is pretty short - giving it a signal to backoff doesn't save you much - it has sent much of what it needs to send already anyhow. Unless you drop almost all of that flow from your buffers you haven't achieved much. Further, a loss event has a high chance of damaging the flow more seriously than you intended - dropping a SYN or the last packet of the data train is a packet that will have very slow retry timers, and short flows are comprised of high percentages of these kinds of packets. non-drop based loss notification like connex/ecn do less damage but are ineffective again because the short flow is more or less complete when the notification arrives so it cannot adapt the sending rate.

The problem is all of those other parallel HTTP sessions going on that didn't get the message. Its the aggregate that causes the buffer build up. Many sites commonly use 60-90 separate uncoordinated TCP flows just to load one page.

Making web transport more adaptable on this front is a big goal of my spdy work. When spdy consolidates resources onto the same tcp flow it means the remaining larger flows will be much more tcp friendly. Loss indicators will have a fighting chance of hitting the flow that can still backoff, and we won't have windows growing independently of each other. (Do you like the sound of IW=10 times 90? That's what 90 uncorrelated flows mean. IW=10 of a small number of flows otoh is excellent.). That ought to keep router queue sizes down and give things like rtcweb a fighting chance.

It also opens up the possibility of the browser identifying queue growth through delay based analysis and possibly helping the situation out by managing inside the browser our bulk tcp download rate (and definitely the upload rate) by munging the rwin or something like that. If it goes right it really shouldn't hurt throughput while giving better latency to other applications. It's all very pie in the sky and down the road, but its kind of hard to imagine in the current HTTP/1.x world.

Friday, November 11, 2011

Video of SPDY Talk at

Yesterday, I was fortunate enough to be able to address the conference and share my thoughts on why SPDY is an important change for the web. They have made the video of my talk available on-line. (I guess that saves me the air-mozilla brownbag - just skip the 3 minute community-involvement video near the beginning assuming you've seen it already)

codebits is full of vitality. Portugal is lucky to have it.

Friday, September 23, 2011

SPDY: What I Like About You.

I've been working on implementing SPDY as an experiment in Firefox lately. We'll have to see how it plays out, but so far I really like it.

Development and benchmarking is still a work in progress, though interop seems to be complete. There are several significant to-do items left that have the potential to improve things even further. The couple of anecdotal benchmarks I have collected are broadly similar to the page load time based reports Google has shared at the IETF and velocity conf over the last few months.

tl;dr; Faster is all well and good (and I mean that!) but I'm going to make a long argument that SPDY is good for the Internet beyond faster page load times. Compared to HTTP, it is more scalable, plays nicer with other Internet traffic and brings web security forward.

SPDY: What I Like About You

#1: Infinite Parallelism with Shared Congestion Control.

You probably know that SPDY allows multiplexing of multiple HTTP resources inside one TCP stream. Unlike the related HTTP mechanisms of pipelining concurrent requests on one TCP stream, the SPDY resources can be returned in any order and even mixed together in small chunks so that head of line blocking is never an issue and you never need more than one connection to each real server. This is great for high latency environments because a resource never needs to be queued on either the client or the server for any reason other than network congestion limits.

Normal HTTP achieves transaction parallelism through parallel TCP connections. Browsers limit you to 6 parallel connections per host. Servers achieve greater parallelism by sharding their resources across a variety of host names. Often these host names are aliases for the same host, implemented explicitly to bypass the 6 connection limitation. For example, and are actually DNS CNAMEs for the same server. It is not uncommon to see performance oriented sites, like the Google properties, shard things over as many as 6 host names in order to allow 36 parallel HTTP sessions.

Parallelism is a must-have for performance. I'm looking at a trace right now that uses the afore mentioned 36 parallel HTTP sessions and its page load completes in 16.5 seconds. If I restrict it to just 1 connection per host (i.e. 6 overall), the same page takes 27.7 seconds to load. If I restrict that even further to just 1 connection in total in takes a mind numbing 94 seconds to load. And this is on 40ms RTT broadband - high latency environments such as mobile would suffer much much worse! Keep this in mind when I start saying bad things about parallel connections below, they really do great things and the web we have with them enables much more impressive applications than a single connection HTTP web ever could.

Of course using multiple parallel HTTP connections is not perfect - if they were perfect we wouldn't try to limit them to 6 at a time. There are two main problems. The first is that each connection requires a TCP handshake which incurs an extra RTT (or maybe 3 if you are using SSL) before the connection can be used. The TCP handshake is also relatively computationally hard compared to moving data (servers easily move millions of packets per second, while connection termination is generally measured in the tens of thousands), the SSL handshake even harder. Reducing the number of connections reduces this burden. But in all honesty this is becoming less of a problem over time - the cost of maintaining persistent connections is going down (which amortizes the handshake cost) and servers are getting pretty good at executing the handshakes (both SSL and vanilla) sometimes by employing the help of multi-tiered architectures for busy deployments.

The architectural problem lies in HTTP's interaction with TCP congestion control. HTTP flows are generally pretty short (a few packets per transaction), tend to stop and start a lot, and more or less play poorly with the congestion control model. The model works really well for long flows like a FTP download - that TCP stream will automatically adapt to the available bandwidth of the network and transfer at a fairly steady rate for its duration after a little bit of acclimation time. HTTP flows are generally too short to ever acclimate properly.

A SPDY flow, being the aggregation of all the parallel HTTP connections, looks to be a lot longer, busier, and more consistent than any of the individual parallel HTTP flows would be. Simply put - that makes it work better because all of that TCP congestion logic is applied to one flow instead of being repeated independently across all the parallel HTTP mini flows.

Less simply, when an idle HTTP session begins to send a response it has to guess at how much data should be put onto the wire. It does this without awareness of all the other flows. Let's say it guesses "4 packets" but there are no other active flows. In this case 4 packets is way too few and the network is under utilized and the page loads poorly. But what if 35 other flows are activated at the same time - this means 140 packets get injected into the network at the same time which is way too many. Under that scenario one of two things happen - both of them are bad:
  1. Packet Loss. TCP reacts poorly to packet loss, especially on short flows. While 140 packets in aggregate is a pretty decent flow, remember that total transmission is made up of 35 different congestion control blocks - each one covering a packet flow of only 4 packets. A loss is devastating to performance because most of the TCP recovery strategies don't work well in that environment.
  2. Over Buffering. This is what Jim Gettys calls bufferbloat. The giant fast moving 140 packet burst arrives at your cable modem head where the bandwidth is stepped down and most of those packets get in a long buffer to wait for their turn on your LAN. That works OK, certainly better than packet loss recovery does in practice, but the deep queue creates a giant problem for any interactive traffic that is sharing that link. Packets for those other applications (such as VOIP, gaming, video chat, etc..) now have to sit in this long queue resulting in interactive lag. Lag sucks for the Internet. The HTTP streams themselves also become non-responsive to cancel events because the only way to clear those queues is to wait them out - so clicking on a link to a new page is significantly delayed while the old page that you have already abandoned continues to consume your bandwidth.
This describes a real dilemma - if you guess more aggressive send windows then you will have a better chance of filling the network but you will also have a better chance of packet loss or over buffering. If you guess more conservative windows then loss and buffering happens less often but nothing ever runs very quickly. In the face of all those flows with independent congestion control blocks, there just isn't enough information available. (This of course is related to the famous Initial Window 10 proposal, which I support, but that's another non SPDY story.)

I'm sure you can see where this is going now. SPDY's parallelism, by virtue of being on a single TCP stream, leverages one busy shared congestion control block instead of dealing with 36 independent tiny ones. Because the stream is much busier it rarely has to guess at how much to send (you only need to guess when you're idle, SPDY is more likely to be getting active feedback), if it should drop a packet it reacts to that loss much better via the various fast recovery mechanisms of TCP, and when it is competing for bandwidth at a choke point it is much more responsive to the signals of other streams - reducing the over buffering problem.

It is for these reasons that SPDY is really exciting to me. Parallel connections work great - they work so great that it is hard to have SPDY significantly improve on the page load time of highly sharded site unless there is a very high amount of latency present.  But the structural advantages of SPDY enable important efforts like RTCWeb as well as provide better network utilization and help servers scale when compared to HTTP. Even if page load times only stay at par, those other good for the Internet attributes make it worth deploying.

#2: SPDY is over SSL every time.

I greatly lament that I am late to the school of SSL-all-the-time. I spent many years trying to eek the greatest amount of server responses per watt that was possible. I looked at SSL and saw impediments. That stayed with me.

I was right about the impediments, and I've learned a lot about dealing with them, but what I didn't get was that it is simply worth the cost. As we have all seen lately, SSL isn't perfect - but having a layer of protection against an entire class of eavesdropping attacks is a property that should be able to be relied upon in every protocol as generic as HTTP. HTTP does not provide that guarantee - but SPDY does.


As a incentive to make the transition to SSL all the time, this makes it worth deploying by itself.

#3:Header compression. 

SPDY compresses all the HTTP-equivalent headers using a specialized dictionary and a compression context that is reserved only for the headers so it does not get diluted with non-header references. This specialized scheme performs very well.

At first I really didn't think that would matter very much - but it is really a significant savings. HTTP's statelessness had its upsides, but the resulting on the wire redundancy was really over the top.

As a sample, I am looking right now at a trace of 1900 resources (that's about 40 pages). 760KB of total downstream plain text header bytes were received as 88KB compressed bytes, and upstream 949KB of plain text headers were compressed as just 65KB. I'll take 1.56MB (82%) in total overhead savings!  I even have a todo item that will make this slightly better.