Bits Up!: Performance of Pipelining in HTTP Firefox

This post provides some performance measurements of my HTTP pipeline patches for Firefox.

The key benefits of pipelining are reduced transaction latency and potentially the use of fewer connections, so those are generally the metrics I will focus on here. For each of my 5 test cases we will look at the following statistics:

The percentage of requests that are delayed in the request queue inside Firefox waiting for a connection to be available.
The average amount of queue latency for each transaction. This is measured as the time the first byte of the request is given to the kernel minus the time at which the request was presented to Necko/Gecko. It is generally very low but greater than 0 even if the transaction can be placed directly on a pipeline because it takes a moment to construct the request and perhaps schedule the socket thread if other things are on the CPU. But a high value is an opportunity lost - that is time the request could be in flight and the server could be processing it. It includes the time necessary to create a new connection if that is necessary - that includes the three way TCP handshake but not a DNS lookup (which is cached in the test).
The type of connection used for each transaction (new connection, an idle persistent connection, or a pipelined connection)
The average amount of transaction latency for each transaction. This is measured as the time of the first byte of the response being received by firefox minus the time at which the request was presented to Necko/Gecko. It is possible for the average improvement to be greater than 1 RTT because of cumulative queueing delays in the non pipelined case.
The cumulative fraction of transactions completed before 3 different elapsed times in order to show improved execution time for the test case. The benchmark times are sized appropriately for each test case.

There are 4 data points for each criteria in each test case. Because pipelining is aimed at environments with significant latency and my broadband test connectivity has below average latency for much of the world and every mobile environment the first two data points have 200ms of latency added through a traffic shaper. The two datapoints compare pipelining on vs pipelining off. The other data points measure the same things but without the induced latency.

All tests are run with both a disk and memory cache enabled but empty at the beginning of the run. In order to measure the effectiveness of the pipelining, each of these sites has been put in the "green - pipelining ok" state which is normally auto discovered.

Facebook

The first test is on Facebook. It starts by logging in and then selecting a particular user profile and navigating from that page via lists of friends, occasionally pulling up individual profiles, generating lists of recent updates, and pressing the More link on busy Facebook walls. There are approximately 1400 HTTP transactions in each test run.

The first thing to consider is the percent of requests that are delayed (i.e. queued) within firefox. I think queuing is particularly bad because if the request hasn't been passed to the network there is no way for any advances in server technology to ever operate on it. For instance - servers are prevented from returning responses out of order but nothing would prevent them from processing requests out of order in order to overlap latencies in DB queries and disk I/O if the requests were not queued on the browser side.

Percent of Requests Queued
	With Pipeline	Without Pipeline
Moderate Latency	0	79.1
Low Latency	0	78.6

That is a stark contrast. It is possible by the way to see a request being queued with pipelining enabled - a default configuration limit of 32 governs the maximum depth of the pipeline and not all request types are pipeline eligible.

Average Queue Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	29	1630
Low Latency	6	285

You might expect to see 0ms for the pipelining case as we just illustrated above that no requests were delayed. But the queue latency covers the time from request submission to the time of putting the first byte of the request on the wire, so that includes any connection setup time when establishing a new connection. That is the primary source of the latency seen here for the pipeline enabled case.

That begs the question, when pipelining is enabled how many of the requests are pipelined?

Connection Type Pct
	Moderate Latency w/Pipeline	Moderate Latency wo/Pipeline	Low Latency w/Pipeline	Low Latency wo/Pipeline
New	2	4	2	3
Reused Idle	13	96	13	97
Pipeline	85	0	84	0

We see here a moderate reduction in the number of connections used when pipelining, but most of the effect is a transfer from idle persistent connections over to pipelines. While the percentage of new connections as a portion of the overall request stream has gone down just a tick with pipelining, the impact on the actual number of raw connections is significant - going from roughly 60 without pipelining to 30 with it enabled. That boils down to a 50% reduction in the number of connections created which is a significant provides a very busy site like Facebook a significant scalability boost.

The final criteria deal with transaction latency.

Average Transaction Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	702	1906
Low Latency	341	346

Yowza! Now there is a result. Under conditions with moderate latency the average transaction waits 1200ms less from the time the request is submitted to Necko to the time the first byte of the response header is received. The net effect is so much more than the approx ~250ms RTT because of aggregating queueing delays - without pipelining enabled you are placed in a deep queue which has to be totally cleared with a 1RTT overhead on each one before you are executed. The impact under low latency conditions is probably close to being noise.

Pct of Responses Rcvd in < Xms
	x=1500	x=1200	x=900
Moderate Latency w/Pipeline	97	91	75
Moderate Latency wo/Pipeline	45	38	32

Pct of Responses Rcvd in < Xms
	x=1000	x=700	x=400
Low Latency w/Pipeline	99	92	61
Low Latency wo/Pipeline	99	91	61

Facebook is a big success - probably the biggest success of any of the tests. 200+ms latency situations have performance significantly increased, and low latency scenarios perform similarly while using a few less TCP connections.

Amazon.com

The Amazon.com test walks through a basic window shopping experience at Amazon.com. The home page is loaded, the kindle link is clicked, a few more categories are clicked and the lists of products are generally browsed and sorted by "hot and new" and other similar things. This boils down to about 800 HTTP transactions.

Percent of Requests Queued
	With Pipeline	Without Pipeline
Moderate Latency	0	54.4
Low Latency	0	39.8

Right away you can see that amazon queues fewer requests than Facebook, so the potential improvement is less.

Average Queue Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	116	791
Low Latency	12	136

Connection Type Pct
	Moderate Latency w/Pipeline	Moderate Latency wo/Pipeline	Low Latency w/Pipeline	Low Latency wo/Pipeline
New	12	13	6	13
Reused Idle	20	96	28	87
Pipeline	68	0	66	0

The first thing to note is that less pipelining is going on that with facebook, so again there is less potential for improvement. How the pages are constructed has a lot to do with this (perhaps fewer images, etc..). But almost as interesting is the fact that the number of TCP connections (i.e. new connections) is halved in the low latency case. If the page can be transferred in the same amount of time using fewer connections that is still a win for the web overall.

Average Transaction Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	635	1083
Low Latency	266	204

An interesting result - 400ms off the average transaction in the ~250ms RTT environment, but a notable loss in the low latency scenario. All of the numbers here are averages across two test runs, but just inspecting the amazon test case in particular on some other ad-hoc runs showed quite a bit of variability. My suspicion is server load occasionally results in a single resource taking a long time to return. I have seen this disable pipelining for HTML pages, but leave it enabled for images, in the past.

Pct of Responses Rcvd in < Xms
	x=1200	x=900	x=600
Moderate Latency w/Pipeline	91	79	62
Moderate Latency wo/Pipeline	77	69	61

Pct of Responses Rcvd in < Xms
	x=1000	x=700	x=400
Low Latency w/Pipeline	93	90	84
Low Latency wo/Pipeline	99	94	85

Flickr

The flickr test is probably the simplest of the cases. It simply loads several galleries based on set names and tags. There are roughly 350 HTTP transactions in the test. Under normal conditions Flickr has a high variability in server response time.

Percent of Requests Queued
	With Pipeline	Without Pipeline
Moderate Latency	0	57
Low Latency	0	57

Average Queue Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	43	814
Low Latency	7	211

Connection Type Pct
	Moderate Latency w/Pipeline	Moderate Latency wo/Pipeline	Low Latency w/Pipeline	Low Latency wo/Pipeline
New	10	27	12	27
Reused Idle	19	73	19	73
Pipeline	71	0	69	0

As with the other tests, more than half of the new connections have been replaced when pipelining is enabled.

Average Transaction Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	859	1091
Low Latency	291	366

This result is more modest, but still positive, when compared to our other tests.

Pct of Responses Rcvd in < Xms
	x=2000	x=1500	x=1000
Moderate Latency w/Pipeline	95	87	67
Moderate Latency wo/Pipeline	88	74	60

Pct of Responses Rcvd in < Xms
	x=1000	x=700	x=400
Low Latency w/Pipeline	99	97	72
Low Latency wo/Pipeline	98	93	71

www.AsiaNewsPhoto.com

The test is photo journalism clearing house site located overseas and therefore the broadband low latency case has a starting RTT of closer to 100ms, while the moderate delay case adds 200ms to that. This is the smallest test case - just 175 transactions in each run.

Percent of Requests Queued
	With Pipeline	Without Pipeline
Moderate Latency	0	44
Low Latency	0	38

Average Queue Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	21	726
Low Latency	31	400

I am not yet certain how to explain the very modest rise in queue time for the pipeline case when the added 200ms delay is removed. It must involve an aberrant TCP connection as that is really the only component of queue time when the requests them selves are not delayed due to connection limits.

Connection Type Pct
	Moderate Latency w/Pipeline	Moderate Latency wo/Pipeline	Low Latency w/Pipeline	Low Latency wo/Pipeline
New	16	7	11	8
Reused Idle	33	93	33	92
Pipeline	51	0	56	0

This is the first time we actually see the new connection numbers moving in the wrong direction. In this case I believe the type scheduling restrictions placed on the connection manager are generating new connections that may have been un-necessary in the non pipelining scenario. I'm curious if the effect would fade in a test with more transactions.

Average Transaction Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	1310	1248
Low Latency	752	689

This seems to track the changes in connection types and maybe the test bears further examination to see if an adjustment can be made. The scheduling algorithm seems to be getting in the way of itself and has made performance just a tick worse than before, though not by very much. And certainly not by enough to discount the gains made in some other scenarios.

Pct of Responses Rcvd in < Xms
	x=2000	x=1500	x=1000
Moderate Latency w/Pipeline	84	82	52
Moderate Latency wo/Pipeline	82	76	54

Pct of Responses Rcvd in < Xms
	x=1500	x=1000	x=500
Low Latency w/Pipeline	88	82	46
Low Latency wo/Pipeline	84	82	57

MapQuest

This test is different in that it is driven almost exclusively through JS and XMLHttpRequest. Those elements are present in the Facebook and Amazon tests as well, but they dominate the MapQuest test. In this scenario a map is brought up on the screen and it is manipulated in the usual ways - panning in 4 directions, zooming in and out, and toggling between satelline and map mode. By the time it is done, 711 HTTP transactions have been made.

Percent of Requests Queued
	With Pipeline	Without Pipeline
Moderate Latency	0	42
Low Latency	0	44

Average Queue Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	52	381
Low Latency	10	198

For all cases, the queue latency is pretty low for this test. That means the number of documents requested in one burst is relatively modest.

Connection Type Pct
	Moderate Latency w/Pipeline	Moderate Latency wo/Pipeline	Low Latency w/Pipeline	Low Latency wo/Pipeline
New	12	16	11	14
Reused Idle	30	84	32	86
Pipeline	58	0	57	0

Marginally less new connections are used with pipelining. Hurrah.

Average Transaction Latency (ms)
	With Pipeline	Without Pipeline
Moderate Latency	677	732
Low Latency	260	500

Pct of Responses Rcvd in < Xms
	x=1500	x=1200	x=900
Moderate Latency w/Pipeline	96	87	76
Moderate Latency wo/Pipeline	96	83	70

Pct of Responses Rcvd in < Xms
	x=900	x=600	x=300
Low Latency w/Pipeline	99	95	65
Low Latency wo/Pipeline	87	75	40

Wednesday, December 1, 2010

Performance of Pipelining in HTTP Firefox

Facebook

Amazon.com

Flickr

www.AsiaNewsPhoto.com

MapQuest