Not all packet loss is created equal. In particular, losing a SYN can really ruin your day - or at least the next 3 seconds which can feel like all day. Most Operating Systems take 3 seconds of waiting before retrying the SYN. Most other timeouts are dynamically scaled to the network conditions, but not the SYN. It is generally hardcoded. And on most of today's networks 3 seconds is an eternity.
So, in FF we took a page from Chrome's book and said if Firefox has been waiting for 250ms (configurable via network.http.connection-retry-timeout) then start a second connection in parallel with that first one. Assuming you've got .25% packet loss and general independence between packets spaced that far apart the approach turns the 3 second pause from a 1 in 400 event into a 1 in 16,000 event. That is pretty much the difference between "kinda annoying" and "didn't notice". It's a good idea - if you hate it for whatever reason just set the pref to 0 to disable it.
Taking the idea one step further, if we create two connections because of this timer and they actually both end up completing obviously only one can be used immediately. But we cache the other one as if it were a persistent connection - and then when you need it (which you probably will) you don't have to wait for the handshake at all. It is essentially a prefetched TCP connection. On my desktop, I run with an especially low timer so that any site with a > 100ms RTT benefits from this and its great!
You can see this effect below, using mnot's cool htracr, on the second connection. Note how there is no request placed on it as soon as it is live (the request is the red dot at the top of the grey rectangle - the rectangle represents the connection), but one follows shortly thereafter without having to do a handshake. That's an RTT saved!
You will be able to enjoy this feature in FF 4.0 Beta 9. A buggy version of it is actually included in Beta 8 but disabled behind the pref mentioned above. Feel free to enable and play with it before Beta 9 if you don't mind a connection freeze once in a while.
Real data and musings on the performance of networks, servers, protocols, and their related folks.
Thursday, December 16, 2010
Wednesday, December 1, 2010
Performance of Pipelining in HTTP Firefox
This post provides some performance measurements of my HTTP pipeline patches for Firefox.
The key benefits of pipelining are reduced transaction latency and potentially the use of fewer connections, so those are generally the metrics I will focus on here. For each of my 5 test cases we will look at the following statistics:
There are 4 data points for each criteria in each test case. Because pipelining is aimed at environments with significant latency and my broadband test connectivity has below average latency for much of the world and every mobile environment the first two data points have 200ms of latency added through a traffic shaper. The two datapoints compare pipelining on vs pipelining off. The other data points measure the same things but without the induced latency.
All tests are run with both a disk and memory cache enabled but empty at the beginning of the run. In order to measure the effectiveness of the pipelining, each of these sites has been put in the "green - pipelining ok" state which is normally auto discovered.
Facebook
The first test is on Facebook. It starts by logging in and then selecting a particular user profile and navigating from that page via lists of friends, occasionally pulling up individual profiles, generating lists of recent updates, and pressing the More link on busy Facebook walls. There are approximately 1400 HTTP transactions in each test run.
The first thing to consider is the percent of requests that are delayed (i.e. queued) within firefox. I think queuing is particularly bad because if the request hasn't been passed to the network there is no way for any advances in server technology to ever operate on it. For instance - servers are prevented from returning responses out of order but nothing would prevent them from processing requests out of order in order to overlap latencies in DB queries and disk I/O if the requests were not queued on the browser side.
That is a stark contrast. It is possible by the way to see a request being queued with pipelining enabled - a default configuration limit of 32 governs the maximum depth of the pipeline and not all request types are pipeline eligible.
You might expect to see 0ms for the pipelining case as we just illustrated above that no requests were delayed. But the queue latency covers the time from request submission to the time of putting the first byte of the request on the wire, so that includes any connection setup time when establishing a new connection. That is the primary source of the latency seen here for the pipeline enabled case.
That begs the question, when pipelining is enabled how many of the requests are pipelined?
We see here a moderate reduction in the number of connections used when pipelining, but most of the effect is a transfer from idle persistent connections over to pipelines. While the percentage of new connections as a portion of the overall request stream has gone down just a tick with pipelining, the impact on the actual number of raw connections is significant - going from roughly 60 without pipelining to 30 with it enabled. That boils down to a 50% reduction in the number of connections created which is a significant provides a very busy site like Facebook a significant scalability boost.
The final criteria deal with transaction latency.
Yowza! Now there is a result. Under conditions with moderate latency the average transaction waits 1200ms less from the time the request is submitted to Necko to the time the first byte of the response header is received. The net effect is so much more than the approx ~250ms RTT because of aggregating queueing delays - without pipelining enabled you are placed in a deep queue which has to be totally cleared with a 1RTT overhead on each one before you are executed. The impact under low latency conditions is probably close to being noise.
Facebook is a big success - probably the biggest success of any of the tests. 200+ms latency situations have performance significantly increased, and low latency scenarios perform similarly while using a few less TCP connections.
The Amazon.com test walks through a basic window shopping experience at Amazon.com. The home page is loaded, the kindle link is clicked, a few more categories are clicked and the lists of products are generally browsed and sorted by "hot and new" and other similar things. This boils down to about 800 HTTP transactions.
Right away you can see that amazon queues fewer requests than Facebook, so the potential improvement is less.
The first thing to note is that less pipelining is going on that with facebook, so again there is less potential for improvement. How the pages are constructed has a lot to do with this (perhaps fewer images, etc..). But almost as interesting is the fact that the number of TCP connections (i.e. new connections) is halved in the low latency case. If the page can be transferred in the same amount of time using fewer connections that is still a win for the web overall.
An interesting result - 400ms off the average transaction in the ~250ms RTT environment, but a notable loss in the low latency scenario. All of the numbers here are averages across two test runs, but just inspecting the amazon test case in particular on some other ad-hoc runs showed quite a bit of variability. My suspicion is server load occasionally results in a single resource taking a long time to return. I have seen this disable pipelining for HTML pages, but leave it enabled for images, in the past.
The flickr test is probably the simplest of the cases. It simply loads several galleries based on set names and tags. There are roughly 350 HTTP transactions in the test. Under normal conditions Flickr has a high variability in server response time.
As with the other tests, more than half of the new connections have been replaced when pipelining is enabled.
This result is more modest, but still positive, when compared to our other tests.
The test is photo journalism clearing house site located overseas and therefore the broadband low latency case has a starting RTT of closer to 100ms, while the moderate delay case adds 200ms to that. This is the smallest test case - just 175 transactions in each run.
I am not yet certain how to explain the very modest rise in queue time for the pipeline case when the added 200ms delay is removed. It must involve an aberrant TCP connection as that is really the only component of queue time when the requests them selves are not delayed due to connection limits.
This is the first time we actually see the new connection numbers moving in the wrong direction. In this case I believe the type scheduling restrictions placed on the connection manager are generating new connections that may have been un-necessary in the non pipelining scenario. I'm curious if the effect would fade in a test with more transactions.
This seems to track the changes in connection types and maybe the test bears further examination to see if an adjustment can be made. The scheduling algorithm seems to be getting in the way of itself and has made performance just a tick worse than before, though not by very much. And certainly not by enough to discount the gains made in some other scenarios.
This test is different in that it is driven almost exclusively through JS and XMLHttpRequest. Those elements are present in the Facebook and Amazon tests as well, but they dominate the MapQuest test. In this scenario a map is brought up on the screen and it is manipulated in the usual ways - panning in 4 directions, zooming in and out, and toggling between satelline and map mode. By the time it is done, 711 HTTP transactions have been made.
For all cases, the queue latency is pretty low for this test. That means the number of documents requested in one burst is relatively modest.
Marginally less new connections are used with pipelining. Hurrah.
The key benefits of pipelining are reduced transaction latency and potentially the use of fewer connections, so those are generally the metrics I will focus on here. For each of my 5 test cases we will look at the following statistics:
- The percentage of requests that are delayed in the request queue inside Firefox waiting for a connection to be available.
- The average amount of queue latency for each transaction. This is measured as the time the first byte of the request is given to the kernel minus the time at which the request was presented to Necko/Gecko. It is generally very low but greater than 0 even if the transaction can be placed directly on a pipeline because it takes a moment to construct the request and perhaps schedule the socket thread if other things are on the CPU. But a high value is an opportunity lost - that is time the request could be in flight and the server could be processing it. It includes the time necessary to create a new connection if that is necessary - that includes the three way TCP handshake but not a DNS lookup (which is cached in the test).
- The type of connection used for each transaction (new connection, an idle persistent connection, or a pipelined connection)
- The average amount of transaction latency for each transaction. This is measured as the time of the first byte of the response being received by firefox minus the time at which the request was presented to Necko/Gecko. It is possible for the average improvement to be greater than 1 RTT because of cumulative queueing delays in the non pipelined case.
- The cumulative fraction of transactions completed before 3 different elapsed times in order to show improved execution time for the test case. The benchmark times are sized appropriately for each test case.
There are 4 data points for each criteria in each test case. Because pipelining is aimed at environments with significant latency and my broadband test connectivity has below average latency for much of the world and every mobile environment the first two data points have 200ms of latency added through a traffic shaper. The two datapoints compare pipelining on vs pipelining off. The other data points measure the same things but without the induced latency.
All tests are run with both a disk and memory cache enabled but empty at the beginning of the run. In order to measure the effectiveness of the pipelining, each of these sites has been put in the "green - pipelining ok" state which is normally auto discovered.
The first test is on Facebook. It starts by logging in and then selecting a particular user profile and navigating from that page via lists of friends, occasionally pulling up individual profiles, generating lists of recent updates, and pressing the More link on busy Facebook walls. There are approximately 1400 HTTP transactions in each test run.
The first thing to consider is the percent of requests that are delayed (i.e. queued) within firefox. I think queuing is particularly bad because if the request hasn't been passed to the network there is no way for any advances in server technology to ever operate on it. For instance - servers are prevented from returning responses out of order but nothing would prevent them from processing requests out of order in order to overlap latencies in DB queries and disk I/O if the requests were not queued on the browser side.
Percent of Requests Queued | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 0 | 79.1 |
Low Latency | 0 | 78.6 |
That is a stark contrast. It is possible by the way to see a request being queued with pipelining enabled - a default configuration limit of 32 governs the maximum depth of the pipeline and not all request types are pipeline eligible.
Average Queue Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 29 | 1630 |
Low Latency | 6 | 285 |
You might expect to see 0ms for the pipelining case as we just illustrated above that no requests were delayed. But the queue latency covers the time from request submission to the time of putting the first byte of the request on the wire, so that includes any connection setup time when establishing a new connection. That is the primary source of the latency seen here for the pipeline enabled case.
That begs the question, when pipelining is enabled how many of the requests are pipelined?
Connection Type Pct | ||||
---|---|---|---|---|
Moderate Latency w/Pipeline | Moderate Latency wo/Pipeline | Low Latency w/Pipeline | Low Latency wo/Pipeline | |
New | 2 | 4 | 2 | 3 |
Reused Idle | 13 | 96 | 13 | 97 |
Pipeline | 85 | 0 | 84 | 0 |
We see here a moderate reduction in the number of connections used when pipelining, but most of the effect is a transfer from idle persistent connections over to pipelines. While the percentage of new connections as a portion of the overall request stream has gone down just a tick with pipelining, the impact on the actual number of raw connections is significant - going from roughly 60 without pipelining to 30 with it enabled. That boils down to a 50% reduction in the number of connections created which is a significant provides a very busy site like Facebook a significant scalability boost.
The final criteria deal with transaction latency.
Average Transaction Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 702 | 1906 |
Low Latency | 341 | 346 |
Yowza! Now there is a result. Under conditions with moderate latency the average transaction waits 1200ms less from the time the request is submitted to Necko to the time the first byte of the response header is received. The net effect is so much more than the approx ~250ms RTT because of aggregating queueing delays - without pipelining enabled you are placed in a deep queue which has to be totally cleared with a 1RTT overhead on each one before you are executed. The impact under low latency conditions is probably close to being noise.
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1500 | x=1200 | x=900 | |
Moderate Latency w/Pipeline | 97 | 91 | 75 |
Moderate Latency wo/Pipeline | 45 | 38 | 32 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1000 | x=700 | x=400 | |
Low Latency w/Pipeline | 99 | 92 | 61 |
Low Latency wo/Pipeline | 99 | 91 | 61 |
Facebook is a big success - probably the biggest success of any of the tests. 200+ms latency situations have performance significantly increased, and low latency scenarios perform similarly while using a few less TCP connections.
Amazon.com
The Amazon.com test walks through a basic window shopping experience at Amazon.com. The home page is loaded, the kindle link is clicked, a few more categories are clicked and the lists of products are generally browsed and sorted by "hot and new" and other similar things. This boils down to about 800 HTTP transactions.
Percent of Requests Queued | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 0 | 54.4 |
Low Latency | 0 | 39.8 |
Right away you can see that amazon queues fewer requests than Facebook, so the potential improvement is less.
Average Queue Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 116 | 791 |
Low Latency | 12 | 136 |
Connection Type Pct | ||||
---|---|---|---|---|
Moderate Latency w/Pipeline | Moderate Latency wo/Pipeline | Low Latency w/Pipeline | Low Latency wo/Pipeline | |
New | 12 | 13 | 6 | 13 |
Reused Idle | 20 | 96 | 28 | 87 |
Pipeline | 68 | 0 | 66 | 0 |
The first thing to note is that less pipelining is going on that with facebook, so again there is less potential for improvement. How the pages are constructed has a lot to do with this (perhaps fewer images, etc..). But almost as interesting is the fact that the number of TCP connections (i.e. new connections) is halved in the low latency case. If the page can be transferred in the same amount of time using fewer connections that is still a win for the web overall.
Average Transaction Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 635 | 1083 |
Low Latency | 266 | 204 |
An interesting result - 400ms off the average transaction in the ~250ms RTT environment, but a notable loss in the low latency scenario. All of the numbers here are averages across two test runs, but just inspecting the amazon test case in particular on some other ad-hoc runs showed quite a bit of variability. My suspicion is server load occasionally results in a single resource taking a long time to return. I have seen this disable pipelining for HTML pages, but leave it enabled for images, in the past.
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1200 | x=900 | x=600 | |
Moderate Latency w/Pipeline | 91 | 79 | 62 |
Moderate Latency wo/Pipeline | 77 | 69 | 61 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1000 | x=700 | x=400 | |
Low Latency w/Pipeline | 93 | 90 | 84 |
Low Latency wo/Pipeline | 99 | 94 | 85 |
Flickr
The flickr test is probably the simplest of the cases. It simply loads several galleries based on set names and tags. There are roughly 350 HTTP transactions in the test. Under normal conditions Flickr has a high variability in server response time.
Percent of Requests Queued | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 0 | 57 |
Low Latency | 0 | 57 |
Average Queue Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 43 | 814 |
Low Latency | 7 | 211 |
Connection Type Pct | ||||
---|---|---|---|---|
Moderate Latency w/Pipeline | Moderate Latency wo/Pipeline | Low Latency w/Pipeline | Low Latency wo/Pipeline | |
New | 10 | 27 | 12 | 27 |
Reused Idle | 19 | 73 | 19 | 73 |
Pipeline | 71 | 0 | 69 | 0 |
As with the other tests, more than half of the new connections have been replaced when pipelining is enabled.
Average Transaction Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 859 | 1091 |
Low Latency | 291 | 366 |
This result is more modest, but still positive, when compared to our other tests.
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=2000 | x=1500 | x=1000 | |
Moderate Latency w/Pipeline | 95 | 87 | 67 |
Moderate Latency wo/Pipeline | 88 | 74 | 60 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1000 | x=700 | x=400 | |
Low Latency w/Pipeline | 99 | 97 | 72 |
Low Latency wo/Pipeline | 98 | 93 | 71 |
www.AsiaNewsPhoto.com
The test is photo journalism clearing house site located overseas and therefore the broadband low latency case has a starting RTT of closer to 100ms, while the moderate delay case adds 200ms to that. This is the smallest test case - just 175 transactions in each run.
Percent of Requests Queued | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 0 | 44 |
Low Latency | 0 | 38 |
Average Queue Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 21 | 726 |
Low Latency | 31 | 400 |
I am not yet certain how to explain the very modest rise in queue time for the pipeline case when the added 200ms delay is removed. It must involve an aberrant TCP connection as that is really the only component of queue time when the requests them selves are not delayed due to connection limits.
Connection Type Pct | ||||
---|---|---|---|---|
Moderate Latency w/Pipeline | Moderate Latency wo/Pipeline | Low Latency w/Pipeline | Low Latency wo/Pipeline | |
New | 16 | 7 | 11 | 8 |
Reused Idle | 33 | 93 | 33 | 92 |
Pipeline | 51 | 0 | 56 | 0 |
This is the first time we actually see the new connection numbers moving in the wrong direction. In this case I believe the type scheduling restrictions placed on the connection manager are generating new connections that may have been un-necessary in the non pipelining scenario. I'm curious if the effect would fade in a test with more transactions.
Average Transaction Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 1310 | 1248 |
Low Latency | 752 | 689 |
This seems to track the changes in connection types and maybe the test bears further examination to see if an adjustment can be made. The scheduling algorithm seems to be getting in the way of itself and has made performance just a tick worse than before, though not by very much. And certainly not by enough to discount the gains made in some other scenarios.
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=2000 | x=1500 | x=1000 | |
Moderate Latency w/Pipeline | 84 | 82 | 52 |
Moderate Latency wo/Pipeline | 82 | 76 | 54 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1500 | x=1000 | x=500 | |
Low Latency w/Pipeline | 88 | 82 | 46 |
Low Latency wo/Pipeline | 84 | 82 | 57 |
MapQuest
This test is different in that it is driven almost exclusively through JS and XMLHttpRequest. Those elements are present in the Facebook and Amazon tests as well, but they dominate the MapQuest test. In this scenario a map is brought up on the screen and it is manipulated in the usual ways - panning in 4 directions, zooming in and out, and toggling between satelline and map mode. By the time it is done, 711 HTTP transactions have been made.
Percent of Requests Queued | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 0 | 42 |
Low Latency | 0 | 44 |
Average Queue Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 52 | 381 |
Low Latency | 10 | 198 |
For all cases, the queue latency is pretty low for this test. That means the number of documents requested in one burst is relatively modest.
Connection Type Pct | ||||
---|---|---|---|---|
Moderate Latency w/Pipeline | Moderate Latency wo/Pipeline | Low Latency w/Pipeline | Low Latency wo/Pipeline | |
New | 12 | 16 | 11 | 14 |
Reused Idle | 30 | 84 | 32 | 86 |
Pipeline | 58 | 0 | 57 | 0 |
Marginally less new connections are used with pipelining. Hurrah.
Average Transaction Latency (ms) | ||
---|---|---|
With Pipeline | Without Pipeline | |
Moderate Latency | 677 | 732 |
Low Latency | 260 | 500 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=1500 | x=1200 | x=900 | |
Moderate Latency w/Pipeline | 96 | 87 | 76 |
Moderate Latency wo/Pipeline | 96 | 83 | 70 |
Pct of Responses Rcvd in < Xms | |||
---|---|---|---|
x=900 | x=600 | x=300 | |
Low Latency w/Pipeline | 99 | 95 | 65 |
Low Latency wo/Pipeline | 87 | 75 | 40 |