Thursday, December 16, 2010

Accelerated Connection Retry for HTTP and Firefox

Not all packet loss is created equal. In particular, losing a SYN can really ruin your day - or at least the next 3 seconds which can feel like all day. Most Operating Systems take 3 seconds of waiting before retrying the SYN. Most other timeouts are dynamically scaled to the network conditions, but not the SYN. It is generally hardcoded. And on most of today's networks 3 seconds is an eternity.

So, in FF we took a page from Chrome's book and said if Firefox has been waiting for 250ms (configurable via network.http.connection-retry-timeout) then start a second connection in parallel with that first one. Assuming you've got .25% packet loss and general independence between packets spaced that far apart the approach turns the 3 second pause from a 1 in 400 event into a 1 in 16,000 event. That is pretty much the difference between "kinda annoying" and "didn't notice". It's a good idea - if you hate it for whatever reason just set the pref to 0 to disable it.

Taking the idea one step further, if we create two connections because of this timer and they actually both end up completing obviously only one can be used immediately. But we cache the other one as if it were a persistent connection - and then when you need it (which you probably will) you don't have to wait for the handshake at all. It is essentially a prefetched TCP connection. On my desktop, I run with an especially low timer so that any site with a > 100ms RTT benefits from this and its great!

You can see this effect below, using mnot's cool htracr, on the second connection. Note how there is no request placed on it as soon as it is live (the request is the red dot at the top of the grey rectangle - the rectangle represents the connection), but one follows shortly thereafter without having to do a handshake. That's an RTT saved!


You will be able to enjoy this feature in FF 4.0 Beta 9. A buggy version of it is actually included in Beta 8 but disabled behind the pref mentioned above. Feel free to enable and play with it before Beta 9 if you don't mind a connection freeze once in a while.

Wednesday, December 1, 2010

Performance of Pipelining in HTTP Firefox

This post provides some performance measurements of my HTTP pipeline patches for Firefox.

The key benefits of pipelining are reduced transaction latency and potentially the use of fewer connections, so those are generally the metrics I will focus on here. For each of my 5 test cases we will look at the following statistics:
  • The percentage of requests that are delayed in the request queue inside Firefox waiting for a connection to be available.
  • The average amount of queue latency for each transaction. This is measured as the time the first byte of the request is given to the kernel minus the time at which the request was presented to Necko/Gecko. It is generally very low but greater than 0 even if the transaction can be placed directly on a pipeline because it takes a moment to construct the request and perhaps schedule the socket thread if other things are on the CPU. But a high value is an opportunity lost - that is time the request could be in flight and the server could be processing it. It includes the time necessary to create a new connection if that is necessary - that includes the three way TCP handshake but not a DNS lookup (which is cached in the test).
  • The type of connection used for each transaction (new connection, an idle persistent connection, or a pipelined connection)
  • The average amount of transaction latency for each transaction. This is measured as the time of the first byte of the response being received by firefox minus the time at which the request was presented to Necko/Gecko. It is possible for the average improvement to be greater than 1 RTT because of cumulative queueing delays in the non pipelined case.
  • The cumulative fraction of transactions completed before 3 different elapsed times in order to show improved execution time for the test case. The benchmark times are sized appropriately for each test case.

There are 4 data points for each criteria in each test case. Because pipelining is aimed at environments with significant latency and my broadband test connectivity has below average latency for much of the world and every mobile environment the first two data points have 200ms of latency added through a traffic shaper. The two datapoints compare pipelining on vs pipelining off. The other data points measure the same things but without the induced latency.

All tests are run with both a disk and memory cache enabled but empty at the beginning of the run. In order to measure the effectiveness of the pipelining, each of these sites has been put in the "green - pipelining ok" state which is normally auto discovered.

Facebook


The first test is on Facebook. It starts by logging in and then selecting a particular user profile and navigating from that page via lists of friends, occasionally pulling up individual profiles, generating lists of recent updates, and pressing the More link on busy Facebook walls. There are approximately 1400 HTTP transactions in each test run.

The first thing to consider is the percent of requests that are delayed (i.e. queued) within firefox. I think queuing is particularly bad because if the request hasn't been passed to the network there is no way for any advances in server technology to ever operate on it. For instance - servers are prevented from returning responses out of order but nothing would prevent them from processing requests out of order in order to overlap latencies in DB queries and disk I/O if the requests were not queued on the browser side.

Percent of Requests Queued
With PipelineWithout Pipeline
Moderate Latency079.1
Low Latency078.6

That is a stark contrast. It is possible by the way to see a request being queued with pipelining enabled - a default configuration limit of 32 governs the maximum depth of the pipeline and not all request types are pipeline eligible.

Average Queue Latency (ms)
With PipelineWithout Pipeline
Moderate Latency291630
Low Latency6285

You might expect to see 0ms for the pipelining case as we just illustrated above that no requests were delayed. But the queue latency covers the time from request submission to the time of putting the first byte of the request on the wire, so that includes any connection setup time when establishing a new connection. That is the primary source of the latency seen here for the pipeline enabled case.

That begs the question, when pipelining is enabled how many of the requests are pipelined?
Connection Type Pct
Moderate Latency w/PipelineModerate Latency wo/Pipeline Low Latency w/PipelineLow Latency wo/Pipeline
New2423
Reused Idle13961397
Pipeline850840

We see here a moderate reduction in the number of connections used when pipelining, but most of the effect is a transfer from idle persistent connections over to pipelines. While the percentage of new connections as a portion of the overall request stream has gone down just a tick with pipelining, the impact on the actual number of raw connections is significant - going from roughly 60 without pipelining to 30 with it enabled. That boils down to a 50% reduction in the number of connections created which is a significant provides a very busy site like Facebook a significant scalability boost.

The final criteria deal with transaction latency.

Average Transaction Latency (ms)
With PipelineWithout Pipeline
Moderate Latency7021906
Low Latency341346

Yowza! Now there is a result. Under conditions with moderate latency the average transaction waits 1200ms less from the time the request is submitted to Necko to the time the first byte of the response header is received. The net effect is so much more than the approx ~250ms RTT because of aggregating queueing delays - without pipelining enabled you are placed in a deep queue which has to be totally cleared with a 1RTT overhead on each one before you are executed. The impact under low latency conditions is probably close to being noise.

Pct of Responses Rcvd in < Xms
x=1500x=1200x=900
Moderate Latency w/Pipeline979175
Moderate Latency wo/Pipeline453832

Pct of Responses Rcvd in < Xms
x=1000x=700x=400
Low Latency w/Pipeline999261
Low Latency wo/Pipeline999161

Facebook is a big success - probably the biggest success of any of the tests. 200+ms latency situations have performance significantly increased, and low latency scenarios perform similarly while using a few less TCP connections.

Amazon.com


The Amazon.com test walks through a basic window shopping experience at Amazon.com. The home page is loaded, the kindle link is clicked, a few more categories are clicked and the lists of products are generally browsed and sorted by "hot and new" and other similar things. This boils down to about 800 HTTP transactions.

Percent of Requests Queued
With PipelineWithout Pipeline
Moderate Latency054.4
Low Latency039.8

Right away you can see that amazon queues fewer requests than Facebook, so the potential improvement is less.

Average Queue Latency (ms)
With PipelineWithout Pipeline
Moderate Latency116791
Low Latency12136

Connection Type Pct
Moderate Latency w/PipelineModerate Latency wo/Pipeline Low Latency w/PipelineLow Latency wo/Pipeline
New1213613
Reused Idle20962887
Pipeline680660

The first thing to note is that less pipelining is going on that with facebook, so again there is less potential for improvement. How the pages are constructed has a lot to do with this (perhaps fewer images, etc..). But almost as interesting is the fact that the number of TCP connections (i.e. new connections) is halved in the low latency case. If the page can be transferred in the same amount of time using fewer connections that is still a win for the web overall.

Average Transaction Latency (ms)
With PipelineWithout Pipeline
Moderate Latency6351083
Low Latency266204

An interesting result - 400ms off the average transaction in the ~250ms RTT environment, but a notable loss in the low latency scenario. All of the numbers here are averages across two test runs, but just inspecting the amazon test case in particular on some other ad-hoc runs showed quite a bit of variability. My suspicion is server load occasionally results in a single resource taking a long time to return. I have seen this disable pipelining for HTML pages, but leave it enabled for images, in the past.

Pct of Responses Rcvd in < Xms
x=1200x=900x=600
Moderate Latency w/Pipeline917962
Moderate Latency wo/Pipeline776961

Pct of Responses Rcvd in < Xms
x=1000x=700x=400
Low Latency w/Pipeline939084
Low Latency wo/Pipeline999485

Flickr


The flickr test is probably the simplest of the cases. It simply loads several galleries based on set names and tags. There are roughly 350 HTTP transactions in the test. Under normal conditions Flickr has a high variability in server response time.

Percent of Requests Queued
With PipelineWithout Pipeline
Moderate Latency057
Low Latency057

Average Queue Latency (ms)
With PipelineWithout Pipeline
Moderate Latency43814
Low Latency7211

Connection Type Pct
Moderate Latency w/PipelineModerate Latency wo/Pipeline Low Latency w/PipelineLow Latency wo/Pipeline
New10271227
Reused Idle19731973
Pipeline710690

As with the other tests, more than half of the new connections have been replaced when pipelining is enabled.

Average Transaction Latency (ms)
With PipelineWithout Pipeline
Moderate Latency8591091
Low Latency291366

This result is more modest, but still positive, when compared to our other tests.

Pct of Responses Rcvd in < Xms
x=2000x=1500x=1000
Moderate Latency w/Pipeline958767
Moderate Latency wo/Pipeline887460

Pct of Responses Rcvd in < Xms
x=1000x=700x=400
Low Latency w/Pipeline999772
Low Latency wo/Pipeline989371

www.AsiaNewsPhoto.com


The test is photo journalism clearing house site located overseas and therefore the broadband low latency case has a starting RTT of closer to 100ms, while the moderate delay case adds 200ms to that. This is the smallest test case - just 175 transactions in each run.

Percent of Requests Queued
With PipelineWithout Pipeline
Moderate Latency044
Low Latency038

Average Queue Latency (ms)
With PipelineWithout Pipeline
Moderate Latency21726
Low Latency31400

I am not yet certain how to explain the very modest rise in queue time for the pipeline case when the added 200ms delay is removed. It must involve an aberrant TCP connection as that is really the only component of queue time when the requests them selves are not delayed due to connection limits.

Connection Type Pct
Moderate Latency w/PipelineModerate Latency wo/Pipeline Low Latency w/PipelineLow Latency wo/Pipeline
New167118
Reused Idle33933392
Pipeline510560

This is the first time we actually see the new connection numbers moving in the wrong direction. In this case I believe the type scheduling restrictions placed on the connection manager are generating new connections that may have been un-necessary in the non pipelining scenario. I'm curious if the effect would fade in a test with more transactions.

Average Transaction Latency (ms)
With PipelineWithout Pipeline
Moderate Latency13101248
Low Latency752689

This seems to track the changes in connection types and maybe the test bears further examination to see if an adjustment can be made. The scheduling algorithm seems to be getting in the way of itself and has made performance just a tick worse than before, though not by very much. And certainly not by enough to discount the gains made in some other scenarios.

Pct of Responses Rcvd in < Xms
x=2000x=1500x=1000
Moderate Latency w/Pipeline848252
Moderate Latency wo/Pipeline827654

Pct of Responses Rcvd in < Xms
x=1500x=1000x=500
Low Latency w/Pipeline888246
Low Latency wo/Pipeline848257

MapQuest


This test is different in that it is driven almost exclusively through JS and XMLHttpRequest. Those elements are present in the Facebook and Amazon tests as well, but they dominate the MapQuest test. In this scenario a map is brought up on the screen and it is manipulated in the usual ways - panning in 4 directions, zooming in and out, and toggling between satelline and map mode. By the time it is done, 711 HTTP transactions have been made.

Percent of Requests Queued
With PipelineWithout Pipeline
Moderate Latency042
Low Latency044

Average Queue Latency (ms)
With PipelineWithout Pipeline
Moderate Latency52381
Low Latency10198

For all cases, the queue latency is pretty low for this test. That means the number of documents requested in one burst is relatively modest.

Connection Type Pct
Moderate Latency w/PipelineModerate Latency wo/Pipeline Low Latency w/PipelineLow Latency wo/Pipeline
New12161114
Reused Idle30843286
Pipeline580570

Marginally less new connections are used with pipelining. Hurrah.

Average Transaction Latency (ms)
With PipelineWithout Pipeline
Moderate Latency677732
Low Latency260500

Pct of Responses Rcvd in < Xms
x=1500x=1200x=900
Moderate Latency w/Pipeline968776
Moderate Latency wo/Pipeline968370

Pct of Responses Rcvd in < Xms
x=900x=600x=300
Low Latency w/Pipeline999565
Low Latency wo/Pipeline877540