Bits Up!

The Benefits of HTTPS for DNS

2018-05-29T07:00:00.000-04:00

DNS over HTTPS (DoH) is entering the last call (right now Working Group, soon IETF wide) stage of IETF standardization. A common discussion I have about it basically boils down to "why not DNS over TLS (DoT)?" (i.e. work that has already been done by the DPRIVE WG). That does seem simpler, after all.

DoH builds on the great foundation of DoT. The most important part of each protocol is that they provide encrypted and authenticated communication between clients and arbitrary DNS resolvers. DNS transport does get regularly attacked and using either one of these protocols allows clients to protect against such shenanigans. What DoH and DoT have in common is far more important than their differences and for some use cases they will work equally well.

However, I think by integrating itself into the scale of the established HTTP ecosystem DoH will likely gain more traction and solver a broader problem set than a DNS specific solution will.

DoH implementations, by virtue of also being HTTP applications, have easy access to a tremendous amount of commodity infrastructure with which to jump start deployments. Examples are CDNs, hundreds of programming libraries, authorization libraries, proxies, sophisticated load balancers, super high volume servers, and more than a billion deployed Javascript engines that already have HTTP interfaces (they also come with a reasonable security model (CORS) for accessing resources behind firewalls). DoH natively includes HTTP content negotiation as well - letting new expressions of DNS (json, xml, etc..) data blossom in non traditional programming environments.

Using HTTP also has advantages at the protocol level. Both DoH and DoT use TCP and therefore need to implement a multiplexing layer to deal with head of line blocking. DoT uses the Message-ID as a key (inherited from DNS over TCP). DoH addresses this via HTTP/2's multiplexed stream approach. The latter is a richer set of functionality that deals with a wider set of edge cases. For instance HTTP/2 includes prioritization between streams, the ability to interleave different responses on the wire (e.g. to pre-empt an AXFR transfer), and flow control to limit the number of requests in flight without resorting to head of line blocking

As HTTP/2 evolves into QUIC, DoH will benefit from being on the QUIC train without any further standardization. Likely all it will take to deploy DoH over QUIC will be a library upgrade. This will be a very important performance improvement for the protocol. Both DoH and DoT are essentially tunneling datagrams over TCP. Anybody who has run a VPN over TCP knows that's not optimal because packet loss in TCP delays the delivery of many other packets within the connection. This creates an unnecessary slowdown when unrelated things, such as different DNS responses, are carried in the same TCP flow. QUIC addresses that issue and DoH will definitely benefit from the free upgrade.

The big picture potential of DoH emerges only when you zoom out from the DNS context. When you integrate DNS into HTTP you don't just use HTTP as a substrate for DNS transfers but you also integrate DNS into the gigantic pool of pre-existing HTTP traffic.

HTTP has been working for the last few years to reduce the number of connections necessary to transport HTTP exchanges. Every time this happens the transport gets cheaper due to fewer handshakes, faster due to better congestion management, more private due to reduced use of SNI, and more robust to traffic analysis because the pool of mixed up encrypted data becomes more diverse.

Roughly speaking, this is represented by the evolution of HTTP/2 and its extensions. Highlights include HTTP/2 itself (mixing slices of different HTTP exchanges together in real time on the wire), Connection Coalescing (allowing some classes of different origins to share the same connection), Alternative Services (allowing HTTP to control routing and therefore increase Connection Coalescing), the HTTP/2 Origin Frame (making management easier to fine tune Connection Coalescing), and the upcoming Secondary Certificates (allowing a connection to add identities post handshake and therefore, you guessed it, increase Connection Coalescing). This system forms a positive feedback loop - new data enjoys the benefits but it also increases the benefits to all of the existing data. Adding DNS to the mix is a win-win.

The last important bit to touch on is more speculative. One of the newer HTTP semantics is push. Push allows an HTTP connection to send along unsolicited HTTP exchanges on existing connections. By defining an HTTP transport for DNS it can logically be pushed by servers that have reason to believe you will soon be interested in a DNS record you haven't even requested yet. This is an exciting concept, but it also comes with some tricky issues around security and tracking that need to be fully explored and standardized before it can be deployed. Nonetheless, architecting a secure DNS around HTTP enables that possibility down the road. The HTTP and DNS communities are just starting to look at the issues of "Resolverless DNS".

On The Merits of QUIC for HTTP

2017-04-06T09:19:00.000-04:00

I am often asked why the Internet HTTP community is working on an IETF based QUIC when HTTP/2 (RFC 7540) is less than 2 years old. There are good answers! This work is essentially part two of a planned evolution to improve speed, reliability, security, responsiveness, and architecture. These needs were understood when HTTP/2 was standardized but the community decided to defer larger architectural changes to the next version. That next version is arriving now in the form of QUIC.

Flashback to HTTP/2 Development

HTTP/2 development consciously constrained its own work in two ways. One was to preserve HTTP/1 semantics, and the other was to limit the work to what could be done within a traditional TLS/TCP stack.

The choice to use TLS/TCP in HTTP/2 was not a forgone conclusion. During the pre-HTTP/2 bakeoff phase there was quite a bit of side channel chatter about whether anyone would propose a different approach such as the SCTP/UDP/[D]TLS stack RTC DataChannels was contemporaneously considering for standardization.

In the end folks felt that experience and running code were top tier properties for the next revision of HTTP. That argued for TCP/TLS. Using the traditional stack lowered the risk of boiling the ocean, improved the latency of getting something deployed, and captured the meaningful low hanging fruit of multiplexing and priority that HTTP/2 was focused on.

Issues that could not be addressed with TCP/TLS 1.2 were deemed out of scope for that round of development. The mechanisms of ALPN and Alternative Services were created to facilitate successor protocols.

I think our operational experience has proven HTTP/2's choices to be a good decision. It is a big win for a lot of cases, it is now the majority HTTPS protocol, and I don't think we could have done too much better within the constraints applied.

The successor protocol turns out to be QUIC. It is a good sign for the vibrancy of HTTP/2 that the QUIC charter explicitly seeks to map HTTP/2 into its new ecosystem. QUIC is taking on exactly the items that were foreseen when scoping HTTP/2.

This post aims to highlight the benefits of QUIC as it applies to the HTTP ecosystem. I hope it is useful even for those that already understand the protocol mechanics. It, however, does not attempt to fully explain QUIC. The IETF editor's drafts or Jana's recent tutorial might be good references for that.

Fixing the TCP In-Order Penalty

The chief performance frustration with HTTP/2 happens during higher than normal packet loss. The in-order property of TCP spans multiplexed HTTP messages. A single packet loss in one message prevents subsequent unrelated messages from being delivered until the loss is repaired. This is because TCP delays received data in order to provide in-order delivery of the whole stream.

For a simple example imagine images A and B each delivered in two parts in this order: A1, A2, B1, and B2. If only A1 were to suffer a packet loss under TCP that would also delay A2, B1, and B2. While image A is unavoidably damaged by this loss, image B is also impacted even though all of its data was successfully transferred the first time.

This is something that was understood during the development of RFC 7540 and was correctly identified as a tradeoff favoring connections with lower loss rates. Our community has seen some good data on how this has played out "in the wild" recently from both Akamai and Fastly. For most HTTP/2 connections this strategy has been beneficial, but there is a tail population that actually regresses in performance compared to HTTP/1 under high levels of loss because of this.

QUIC fixes this problem through multistreaming onto one connection in a way very familiar to 7540 but with an added twist. It also gives each stream its own ordering context analagous to a TCP sequence number. These streams can be delivered independently to the application because in-order only applies to each stream instead of the whole connection in QUIC.

I believe fixing this issue is the highest impact feature of QUIC.

Starting Faster

HTTP/2 starts much faster than HTTP/1 due to its ability to send multiple requests in the first round trip and its superior connection management results in fewer connection establishments. However, new connections using the TCP/TLS stack still incur 2 or 3 round trips of delay before any HTTP/2 data can be sent.

In order to address this QUIC eschews layers in favor of a component architecture that allows sending encrypted application data immediately. QUIC still uses TLS for security establishment and has a transport connection concept, but these components are not forced into layers that require their own round trips to be initialized. Instead the transport session, the HTTP requests, and the TLS context are all combined into the first packet flight when doing session resumption (i.e. you are returning to a server you have seen before). A key part of this is integration with TLS 1.3 and in particular the 0-RTT (aka Early Data) handshake feature.

The HTTP/2 world, given enough time, will be able to capture some of the same benefits using both TLS 1.3 and TCP Fast Open. Some of that work is made impractical by dependencies on Operating System configurations and the occasional interference from middleboxes unhappy with TCP extensions.

However, even at full deployment of TLS 1.3 and TCP Fast Open, that approach will lag QUIC performance because QUIC can utilize the full flight of data in the first round trip while Fast Open limits the amount of data that can be carried to the roughly 1460 bytes available in a single TCP SYN packet. That packet also needs to include the TLS Client Hello and HTTP SETTINGS information along with any HTTP requests. That single packet runs out of room quickly if you need to encode more than one request or any message body. Any excess needs to wait a round trip.

Harmonizing with TLS

When used with HTTP/1 and HTTP/2, TLS generally operates as a simple pipe. During encryption cleartext streams of bytes go in one side and a stream of encrypted bytes come out the other and are then fed to TCP. The reverse happens when decrypting. Unfortunately, the TLS layer operates internally on multi-byte records instead of a byte stream and the mismatch creates a significant performance problem.

The records can be up to 64KB and a wide variety of sizes are used in practice. In order to enforce data integrity, one of the fundamental security properties of TLS, the entire record must be received before it can be decoded. When the record spans multiple packets a problem similar to the "TCP in-order penalty" discussed earlier appears.

A loss to any packet in the record delays decoding and delivery of the other correctly delivered packets while the loss is repaired. In this case the problem is actually a bit worse as any loss impacts the whole record not just the portion of the stream following the loss. Further, because application delivery of the first byte of the record is always dependent on the receipt of the last byte of the record simple serialization delays or common TCP congestion-control stalls add latency to application delivery even with 0% packet loss.

The 'obvious fix' of placing an independent record in each packet turns out to work much better with QUIC than TCP. This is because TCP's API is a simple byte stream. Applications, including TLS, have no sense of where packets begin or end and have no reliable control over it. Furthermore, TCP proxies or even HTTP proxies commonly rearrange TCP packets while leaving the byte stream in tact (a proof of the practical value of end to end integrity protection!).

Even the absurd-um solution of 1 byte records does not work because the record overhead creates multibyte sequences that will still span packet boundaries. Such a naive approach would also drown in its own overhead.

QUIC shines here by using its component architecture rather than the traditional layers. The QUIC transport layer receives plaintext from the application and consults its own transport information regarding packet numbers, PMTU, and the TLS keying information. It combines all of this to form the encrypted packets that can be decrypted atomically with the equivalent of one record per UDP packet. Intermediaries are unable to mess with the framing because even the transport layer is integrity protected in QUIC during this same process - a significant security bonus! Any loss events will only impact delivery of the lost packet.

No More TCP RST Data Loss

As many HTTP developers will tell you, TCP RST is one of the most painful parts of the existing ecosystem. Its pain comes in many forms, but the data loss is the worst.

The circumstances for an operating system generating a RST and how they respond to them can vary by implementation. One common scenario is a server close()ing a connection that has received another request that the HTTP server has not yet read and is unaware of. This is a routine case for HTTP/1 and HTTP/2 applications. Most kernels will react to the closing of a socket with unconsumed data by sending a RST to the peer.

That RST will be processed out of order when received. In practice this means if the original client does a recv() to consume ordinary data that was sent by the server before the server invoked close() the client will incur an unrecoverable failure if the RST has also already arrived and that data cannot ever be read. This is true even if the kernel has sent a TCP ack for it! The problem gets worse when combined with larger TLS record sizes as often the last bit of data is what is needed to decode the whole record and substantial data loss of up to 64KB occurs.

The QUIC RST equivalent is not part of the orderly shutdown of application streams and it is not expected to ever force the loss of already acknowledged data.

Better Responsiveness through Buffer Management

The primary goal of HTTP/2 was the introduction of multiplexing into a single connection and it was understood that you cannot have meaningful multiplexing without also introducing a priority scheme. HTTP/1 illustrates the problem well - it multiplexes the path through unprioritized TCP parallelism which routinely gives poor results. The final RFC contained both multiplexing and priority mechanisms which for the most part work well.

However, successful prioritization requires you to buffer before serializing the byte stream into TLS and TCP because once sent to TCP those messages cannot be reordered in the case of higher priority data presenting itself. Unfortunately high latency TCP, requires a significant amount of buffering at the socket layer in order to run as fast as possible. These two competing interests make it difficult to judge how much buffering an HTTP/2 sender should use. While there are some Operating System specific oracles that give some clues, TCP itself does not provide any useful guidance to the application for reasonably sizing its socket buffers.

This combination has made it challenging for applications to determine the appropriate level of socket buffering and in turn they sometimes have overbuffered in order to make TCP run at line rate. This results in poor responsiveness to the priority schedule and the inability for a server to recognize individual streams being canceled (which happens more than you may think) because they have already been buffered.

The blending of transport and application components creates the opportunity for QUIC implementations to do a better job on priority. They do this by buffering application data with its priority information outside of the transmission layer. This allows the late binding of the packet transmission to the data that is highest priority at that moment.

Relatedly, whenever a retransmission is required QUIC retransmits the original data in one or more new packets (with new packet identifiers) instead of retransmitting a copy of the lost packet as TCP does. This creates an opportunity to reprioritize, or even drop canceled streams, during retransmission. This compares favorably to TCP which is sentenced to retransmitting the oldest (and perhaps now irrelevant) data first due to its single sequence number and in-order properties.

UDP means Universal DePloyment

QUIC is not inherently either a user space or kernel space protocol - it is quite possible to deploy it in either configuration. However, UDP based applications are often deployed in userspace configurations and do not require special configurations or permissions to run there. It is fair to expect a number of user space based QUIC implementations.

Time will tell exactly what that looks like, but I anticipate it will be a combination of self-updating evergreen applications such as web servers and browsers and also a small set of well maintained libraries akin to the role openssl plays in distributions.

Decoupling functionality traditionally performed by TCP from the operating system creates an opportunity for deploying software faster, updating it more regularly, and iterating on its algorithms in a tight loop. The long replacement and maintenance schedules of operating systems, sometimes measured in decades, create barriers to deploying new approaches to networking.

This new freedom applies both QUIC itself, but also to some of its embedded techniques that have equivalents in the TCP universe that have traditionally been difficult to deploy. Thanks to user space distribution packet pacing, fast open, and loss discovery improvements like RACK will see greater deployment than ever before.

Userspace will mean faster evolution in networking and greater distribution of the resulting best practices.

---

The IETF QUIC effort is informed by Google's efforts on its own preceding protocol. While it is not the same effort it does owe a debt to a number of Google folk. I'm not privy to all of the internal machinations at G but, at the inevitable risk of omitting an important contribution, it is worth calling out Jim Roskind, Jana Iyengar, Ian Swett, Adam Langley, and Ryan Hamilton both for their work and their willingness to evangelize and discuss it with me. Thanks! We're making a better Internet together.

This post was originally drafted as a post to the IETF HTTP Working Group by Patrick McManus ,

Cache-Control: immutable

2016-05-11T09:51:00.000-04:00

About one year ago our friends at Facebook brought an interesting issue to the IETF HTTP Working Group - a lot (20%!) of their transactions for long lived resources (e.g. css, js) were resulting in 304 Not Modified. These are documents that have long explicit cache lifetimes of a year or more and are being revalidated well before they had even existed for that long. The problem remains unchanged.

After investigation this was attributed to people often hitting the reload button. That makes sense on a social networking platform - show me my status updates! Unfortunately, when transferring updates for the dynamic objects browsers were also revalidating hundreds of completely static resources on the same page. While these do generate 304's instead of larger 200's, this adds up to a lot of time and significant bandwidth. It turns out it significantly delays the delivery of the minority content that did change.
Facebook, like many sites, uses versioned URLs - these URLs are never updated to have different content and instead the site changes the subresource URL itself when the content changes. This is a common design pattern, but existing caching mechanisms don't express the idea and therefore when a user clicks reload we check to see if anything has been updated.

IETF standards activity is probably premature without data or running code - so called hope based standardization is generally best avoided. Fortunately, HTTP already provides a mechanism for deploying experiments: Cache-Control extensions.

I put together a test build of Firefox using a new extended attribute - immutable. immutable indicates that the response body will not change over time. It is complementary to the lifetime cachability expressed by max-age and friends.

Cache-Control: max-age=365000000, immutable

When a client supporting immutable sees this attribute it should assume that the resource, if unexpired, is unchanged on the server and therefore should not send a conditional revalidation for it (e.g. If-None-Match or If-Modified-Since) to check for updates. Correcting possible corruption (e.g. shift reload in Firefox) never uses conditional revalidation and still makes sense to do with immutable objects if you're concerned they are corrupted.

This Makes a Big Difference

The initial anecdotal results are encouraging enough to deploy the experiment. This is purely performance, there is no web viewable semantic here, so it can be withdrawn at any time if that is the appropriate thing to do.

For the reload case, immutable saves hundreds of HTTP transactions and improves the load time of the dynamic HTML by hundreds of milliseconds because that no longer competes with the multitude of 304 responses.

Facebook reload without immutable

Facebook reload with immutable

Next Steps

I will land immutable support in Firefox 49 (track the bug). I expect Facebook to be part of the project as we move forward, and any content provider can join the party by adding the appropriate cache-control extension to the response headers of their immutable objects. If you do implement it on the server side drop me a note at mcmanus@ducksong.com with your experience. Clients that aren't aware of extensions must ignore them by HTTP specification and in practice they do - this should be safe to add to your responses. Immutable in Firefox is only honored on https:// transactions.

If the idea pans out I will develop an Internet Draft and bring it back in the standards process - this time with some running code and data behind it.

Thanks Google for Open Source TCP Fix!

2015-09-25T19:55:00.000-04:00

The Google transport networking crew (QUIC, TCP, etc..) deserve a shout out for identifying and fixing a nearly decade old Linux kernel TCP bug that I think will have an outsized impact on performance and efficiency for the Internet.

Their patch addresses a problem with cubic congestion control, which is the default algorithm on many Linux distributions. The problem can be roughly summarized as the controller mistakenly characterizing the lack of congestion reports over a quiescent period as positive evidence that the network is not congested and therefore it should send at a faster rate when sending resumes. When put like this, its obvious that an endpoint that is not moving any traffic cannot use the lack of errors as information in its feedback loop.

The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state. This consequence of this is self induced packet loss along with retransmissions, wasted bandwidth, out of order packet delivery, and application level stalls.

Unfortunately a number of common web use cases are clear triggers here. Any use of persistent connections, where the burst of web data on two separate pages is interspersed with time for the user to interpret the data is an obvious trigger. A far more dangerous class of triggers is likely to be the various HTTP based adaptive streaming media formats where a series of chunks of media are transferred over time on the same HTTP channel. And of course, Linux is a very popular platform for serving media.

As with many bugs, it all seems so obvious afterwards - but tracking this stuff down is the work of quality non-glamorous engineering. Remember that TCP is robust enough that it seems to work anyhow - even at the cost of reduced network throughput in this case. Kudos to the google team for figuring it out, fixing it up, and especially for open sourcing the result. The whole web, including Firefox users, will benefit.

Thanks!

Brotli Content-Encoding for Firefox 44

2015-09-22T12:15:00.000-04:00

The best way to make data appear to move faster over the Web is to move less of it and lossless compression has always been a core tenet of good web design.

Sometimes that is done via over the top gzip of text resources (html, js, css), but other times it is accomplished via the compression inherent in the file format of media elements. Modern sites apply gzip to all of their text as a best practice.

Time marches on, and it turns out we can often do a better job than the venerable gzip. Until recently, new formats struggled with matching the decoding rates of gzip, but lately a new contender named brotli has shown impressive results. It has been able to improve on gzip anywhere from 20% to 40% in terms of compression ratios while keeping up on the decoding rate. Have a look at the author's recent comparative results.

The deployed WOFF2 font file format already uses brotli internally.

If all goes well in testing, Firefox 44 (ETA January 2016) will negotiate brotli as a content-encoding for https resources. The negotiation will be done in the usual way via the Accept-Encoding request header and the token "br". Servers that wish to encode a response with brotli can do so by adding "br" to the Content-Encoding response header. Firefox won't decode brotli outside of https - so make sure to use the HTTP content negotiation framework instead of doing user agent sniffing.

[edit note - around Oct 6 2015 the token was changed to br from brotli. The token brotli was only ever deployed on nightly builds of firefox 44.]

We expect Chrome will deploy something compatible in the near future.

The brotli format is defined by this document working its way through the IETF process. We will work with the authors to make sure the IANA registry for content codings is updated to reference it.

You can get tools to create brotli compressed content here and there is a windows executable I can't vouch for linked here

Opportunistic Encryption For Firefox

2015-03-27T12:06:00.001-04:00

Firefox 37 brings more encryption to the web through opportunistic encryption of some http:// based resources. It will be released the week of March 31st.

OE provides unauthenticated encryption over TLS for data that would otherwise be carried via clear text. This creates some confidentiality in the face of passive eavesdropping, and also provides you much better integrity protection for your data than raw TCP does when dealing with random network noise. The server setup for it is trivial.

These are indeed nice bonuses for http:// - but it still isn't as nice as https://. If you can run https you should - full stop. Don't make me repeat it :) Only https protects you from active man in the middle attackers.

But if you have long tail of legacy content that you cannot yet get migrated to https, commonly due to mixed-content rules and interactions with third parties, OE provides a mechanism for an encrypted transport of http:// data. That's a strict improvement over the cleartext alternative.

Two simple steps to configure a server for OE

Install a TLS based h2 or spdy server on a separate port. 443 is a good choice :). You can use a self-signed certificate if you like because OE is not authenticated.
Add a response header Alt-Svc: h2=":443" or spdy/3.1 if you are using a spdy enabled server like nginx.

When the browser consumes that response header it will start to verify the fact that there is a HTTP/2 service on port 443. When a session with that port is established it will start routing the requests it would normally send in cleartext to port 80 onto port 443 with encryption instead. There will be no delay in responsiveness because the new connection is fully established in the background before being used. If the alternative service (port 443) becomes unavailable or cannot be verified Firefox will automatically return to using cleartext on port 80. Clients that don't speak the right protocols just ignore the header and continue to use port 80.

This mapping is saved and used in the future. It is important to understand that while the transaction is being routed to a different port the origin of the resource hasn't changed (i.e. if the cleartext origin was http://www.example.com:80 then the origin, including the http scheme and the port 80, are unchanged even if it routed to port 443 over TLS). OE is not available with HTTP/1 servers because that protocol does not carry the scheme as part of each transaction which is a necessary ingredient for the Alt-Svc approach.

You can control some details about how long the Alt-Svc mappings last and some other details. The Internet-Draft is helpful as a reference. As the technology matures we will be tracking it; the recent HTTP working group meeting in Dallas decided this was ready to proceed to last call status in the working group.

HTTP/2 is Live in Firefox

2015-02-18T12:00:00.000-05:00

The Internet is chirping loudly today with news that draft-17 of the HTTP/2 specification has been anointed proposed standard. huzzah! Some reports talk about it as the future of the web - but the truth is that future is already here today in Firefox.

9% of all Firefox release channel HTTP transactions are already happening over HTTP/2. There are actually more HTTP/2 connections made than SPDY ones. This is well exercised technology.

Firefox 35, in current release, uses a draft ID of h2-14 and you will negotiate it with google.com today.
Firefox 36, in Beta to be released NEXT WEEK, supports the official final "h2" protocol for negotiation. I expect lots of new server side work to come on board rapidly now that the specification has stabilized. Firefox 36 also supports draft IDs -14 and -15. You will negotiate -15 with twitter as well as google using this channel.
Firefox 37 and 38 have the same levels of support - adding draft-16 to the mix. The important part is that the final h2 ALPN token remains fixed. These releases also have relevant IETF drafts for opportunistic security over h2 via the Alternate-Service mechanism implemented. A blog post on that will follow as we get closer to release - but feel free to reach out and experiment with it on these early channels.

Sometime in the near future I will remove support for the various draft levels of HTTP/2 that have been used to get us to this point and we'll just offer the "h2" of the proposed standard.

For both SPDY and HTTP/2 the killer feature is arbitrary multiplexing on a single well congestion controlled channel. It amazes me how important this is and how well it works. One great metric around that which I enjoy is the fraction of connections created that carry just a single HTTP transaction (and thus make that transaction bear all the overhead). For HTTP/1 74% of our active connections carry just a single transaction - persistent connections just aren't as helpful as we all want. But in HTTP/2 that number plummets to 25%. That's a huge win for overhead reduction. Let's build the web around that.

HTTP/2 Dependency Priorities in Firefox 37

2015-01-07T15:56:00.001-05:00

Next week Firefox 35 will be in general release, and Firefox 37 will be promoted to the Developer Edition channel (aka Firefox Aurora).

HTTP/2 support will be enabled for the first time by default on a release channel in Firefox 35. Use it in good health on sites like https://twitter.com. Its awesome.

This post is about a feature that landed in Firefox 37 - the use of HTTP/2 priority dependencies and groupings. In earlier releases prioritization was done strictly through relative weightings - similar to SPDY or UNIX nice values. H2 lets us take that a step further and say that some resources are dependent on other resources in addition to using relative weights between transactions that are dependent on the same thing.

There are some simple cases where you really want a strict dependency relationship - for example two frames of the same video should be serialized on the wire rather than sharing bandwidth. More complicated relationships can be expressed through the use of grouping streams. Grouping streams are H2 streams that are never actually opened with a HEADERS frame - they exist simply to be nodes in the dependency tree that other streams depend on.

The canonical use case involves better prioritization for loading html pages that include js, css, and lots of images. When doing so over H1 the browser will actually defer sending the request for the images while the js and css load - the reason is that the transfer of the js/css blocks any rendering of the page and the high byte count of the images slows down the higher priority js/css if done in parallel. The workaround, not requesting the images at all while the js/css is loading, has some downsides - it incurs at least one extra round trip delay and it doesn't utilize the available bandwidth effectively in some cases.

The weighting mechanisms of H2 (and SPDY) can help here - and they are what are used prior to Firefox 37. Full parallelism is restored, but some unwanted bandwidth sharing still goes on.

I've implemented a scheme for H2 using 5 fixed dependency groups (known informally as leader, follower, unblocked, background, and speculative). They are created with the PRIORITY frame upon session establishment and every new stream depends on one of them.

Streams for things like js and css are dependent on the leader group and images are dependent on the follower group. The use of group nodes, rather than being dependent on the js/css directly, greatly simplifies dependency management when some streams complete and new streams of the same class are created - no reprioritization needs to be done when the group nodes are used.

This is experimental - the tree organization and its weights will evolve over time. Various types of resource loads will still have to be better classified into the right groups within Firefox and that too will evolve over time. There is an obvious implication for prioritization of tabs as well, and that will also follow over time. Nonetheless its a start - and I'm excited about it.

One last note to H2 server implementors, if you should be reading this, there is a very strong implication here that you need to pay attention to the dependency information and not simply implement the individual resource weightings. Given the tree above, think about what would happen if there was a stream dependent on leaders with a weight of 1 and a stream dependent on speculative with a weight of 255. Because the entire leader group exists at the same level of the tree as background (and by inclusion speculative) the leader descendents should dominate the bandwidth allocation due to the relative weights of those group nodes - but looking only at the weights of the individual streams gives the incorrect conclusion.

Firefox gecko API for HTTP/2 Push

2014-12-01T06:00:00.000-05:00

HTTP/2 provides a mechanism for a server to push both requests and responses to connected clients. Up to this point we've used that as a browser cache seeding mechanism. That's pretty neat, it gives you the performance benefits of inlining with better cache granularity and, more importantly, improved priority handling and it does it all transparently.

However, as part of gecko 36 we added a new gecko (i.e. internal firefox and add-on) API called nsIHttpPushListener that allows direct consumption of pushes without waiting for a cache hit. This opens up programming models other than browsing.

A single HTTP/2 stream, likely formed as a long lasting transaction from an XHR, can receive multiple pushed events correlated to it without having to form individual hanging polls for each event. Each event is both a HTTP request and HTTP response and is as arbitrarily expressive as those things can be.

It seems likely any implementation of a new Web based push notification protocol would be built around HTTP/2 pushes and this interface would provide the basis for subscribing and consuming those events.

nsIHttpPushListener is only implemented for HTTP/2. Spdy has a compatible feature set, but we've begun transitioning to the open standard and will likely not evolve the feature set of spdy any futher at this point.

There is no webidl dom access to the feature set yet, that is something that should be standardized across browsers before being made available.

Proxy Connections over TLS - Firefox 33

2014-11-21T22:46:00.002-05:00

There have been a bunch of interesting developments over the past few months in Mozilla Platform Networking that will be news to some folks. I've been remiss in not noting them here. I'll start with the proxying over TLS feature. It landed as part of Firefox 33, which is the current release.

This feature is from bug 378637 and is sometimes known as HTTPS proxying. I find that naming a bit ambigous - the feature is about connecting to your proxy server over HTTPS but it supports proxying for both http:// and https:// resources (as well as ftp://, ws://, and ws:/// for that matter). https:// transactions are tunneled via end to end TLS through the proxy via the CONNECT method in addition to the connection to the proxy being made over a separate TLS session.. For https:// and wss:// that means you actually have end to end TLS wrapped inside a second TLS connection between the client and the proxy.

There are some obvious and non obvious advantages here - but proxying over TLS is strictly better than traditional plaintext proxying. One obvious reason is that it provides authentication of your proxy choice - if you have defined a proxy then you're placing an extreme amount of trust in that intermediary. Its nice to know via TLS authentication that you're really talking to the right device.

Also, of course the communication between you and the proxy is also kept confidential which is helpful to your privacy with respect to observers of the link between client and proxy though this is not end to end if you're not accessing a https:// resource. Proxying over TLS connections also keep any proxy specific credentials strictly confidential. There is an advantage even when accessing https:// resources through a proxy tunnel - encrypting the client to proxy hop conceals some information (at least for that hop) that https:// normally leaks such as a hostname through SNI and the server IP address.

Somewhat less obviously, HTTPS proxying is a pre-requisite to proxying via SPDY or HTTP/2. These multiplexed protocols are extremely well suited for use in connecting to a proxy because a large fraction (often 100%) of a clients transactions are funneled through the same proxy and therefore only 1 TCP session is required when using a prioritized multiplexing protocol. When using HTTP/1 a large number of connections are required to avoid head of line blocking and it is difficult to meaningfully manage them to reflect prioritization. When connecting to remote proxies (i.e. those with a high latency such as those in the cloud) this becomes an even more important advantage as the handshakes that are avoided are especially slow in that environment.

This multiplexing can really warp the old noodle to think about after a while - especially if you have multiple spdy/h2 sessions tunneled inside a spdy/h2 connection to the proxy. That can result in the top level multiplexing several streams with http:// transactions served by the proxy as well as connect streams to multiple origins that each contain their own end to end spdy sessions carrying multiple https:// transactions.

To utilize HTTPS proxying just return the HTTPS proxy type from your FindProxyForURL() PAC function (instead of the traditional HTTP type). This is compatible with Google's Chrome, which has a similar feature.

function FindProxyForURL(url, host) {
if (url.substring(0,7) == "http://") {
return "HTTPS proxy.mydomain.net:443;"
}
return "DIRECT;"
}

Squid supports HTTP/1 HTTPS proxying. Spdy proxying can be done via Ilya's node.js based spdy-proxy. nghttp can be used for building HTTP/2 proxying solutions (H2 is not yet enabled by default on firefox release channels - see about:config network.http.spdy.enabled.http2 and network.http.spdy.enabled.http2draft to enable some version of it early). There are no doubt other proxies with appropriate support too.

If you need to add a TOFU exception for use of your proxy it cannot be done in proxy mode. Disable proxying, connect to the proxy host and port directly from the location bar and add the exception. Then enable proxying and the certificate exception will be honored. Obviously, your authentication guarantee will be better if you use a normal WebPKI validated certificate.

On the Application of STRINT to HTTP/2

2014-03-04T11:58:00.001-05:00

I participated for two days last week in the joint W3C/IETF (IAB) workshop on Strengthening the Internet against Pervasive Monitoring (aka STRINT). Now that the IETF has declared pervasive monitoring of the Internet to be a technical attack the goal of the workshop was to establish the next steps to take in reaction to the problem. There were ~100 members of the Internet engineering and policy communities participating - HTTP/2 standardization is an important test case to see if we're serious about following through.

I'm pleased that we were able to come to some rough conclusions and actions. First a word of caution: there is no official report yet, I'm certainly not the workshop secretary, this post only reflects transport security which was a subset of the areas discussed, but I still promise I'm being faithful in reporting the events as I experienced them.

Internet protocols need to make better use of communications security and more encryption - even imperfect unauthenticated crypto is better than trivially snoopable cleartext. It isn't perfect, but it raises the bar for the attacker. New protocols designs should use strongly authenticated mechanisms falling back to weaker measures only as absolutely necessary, and updates to older protocols should be expected to add encryption potentially with disabling switches if compatibility strictly requires it. A logical outcome of that discussion is the addition of these properties (probably by reference, not directly through replacement) to BCP 72 - which provides guidance for writing RFC security considerations.

At a bare minimum, I am acutely concerned with making sure HTTP/2 brings more encryption to the Web. There are certainly many exposures beyond the transport (data storage, data aggregation, federated services, etc..) but in 2014 transport level encryption is a well understood and easily achievable technique that should be as ubiquitously available as clean water and public infrastructure. In the face of known attacks it is a best engineering practice and we shouldn't accept less while still demanding stronger privacy protections too. When you step back from the details and ask yourself if it is really reasonable that a human's interaction with the Web is observable to many silent and undetectable observers the current situation really seems absurd.

The immediate offered solution space is complicated and incomplete. Potential mitigations are fraught with tradeoffs and unintended consequences. The focus here is on what happens to http:// schemed traffic, https is comparably well taken care of. The common solution offered in this space carries http:// over an unauthenticated TLS channel for HTTP/2. The result is a very simple plug and play TLS capable HTTP server that is not dependent on the PKI. This provides protection against passive eaves droppers, but not against active attacks. The cost of attacking is raised in terms of CPU, monetary cost, political implications, and risk of being discovered. In my opinion, that's a win. Encryption simply becomes the new equivalent of clear text - it doesn't promote http:// to https://, it does not produce a lock icon, and it does not grant you any new guarantees that cleartext http:// would not have. I support that approach.

The IETF HTTPbis working group will test this commitment to encryption on Wednesday at the London #IETF89 meeting when http:// schemed URIs over TLS is on the agenda (again). In the past, it not been able to garner consensus. If the group is unable to form consensus around a stronger privacy approach than was done with HTTP/1.1's use of cleartext I would hope the IESG would block the proposed RFC during last call for having insufficiently addressed the security implications of HTTP/2 on the Internet as we now know it.

#ietf89 #strint

SSL Everywhere for HTTP/2 - A New Hope

2013-08-31T10:41:00.001-04:00

Recently the IETF working group on HTTP met in Berlin, Germany and discussed the concept of mandatory to offer TLS for HTTP/2, offered by Mark Nottingham. The current approach to transport security means only 1/5 of web transactions are given the protections of TLS. Currently all of the choices are made by the content owner via the scheme of the url in the markup.

It is time that the Internet infrastructure simply gave users a secure by default transport environment - that doesn't seem like a radical statement to me. From FireSheep to the Google Sniff-Wifi-While-You-Map Car to PRISM there is ample evidence to suggest that secure transports are just necessary table stakes in today's Internet.

This movement in the IETF working group is welcome news and I'm going to do everything I can to help iron out the corner cases and build a robust solution.

My point of view for Firefox has always been that we would only implement HTTP/2 over TLS. That means https:// but it has been my hope to find a way to use it for http:// schemes on servers that supported it too. This is just transport level security - for web level security the different schemes still represent different origins. If cleartext needed to be used it would be done with HTTP/1 and someday in the distant future HTTP/1 would be put out to pasture. This roughly matches Google Chrome's public stance on the issue.

Mandatory to offer does not ban the practice of cleartext - if the client did not want TLS a compliant cleartext channel could be established. This might be appropriate inside a data center for instance - but Firefox would be unlikely to do so.

This approach also does not ban intermediaries completely. https:// uris of course remain end to end and can only be tunneled (as is the case in HTTP/1), but http:// uris could be proxied via appropriately configured clients by having the HTTP/2 stream terminated on the proxy. It would prevent "transparent proxies" which are fundamentally man in the middle attacks anyhow.

Comments over here: https://plus.google.com/100166083286297802191/posts/XVwhcvTyh1R

Go Read "Reducing Web Latency - the Virtue of Gentle Aggression"

2013-08-13T08:38:00.002-04:00

One of the more interesting networking papers I've come across lately is Reducing Web Latency: the Virtue of Gentle Aggression. It details how poorly TCP's packet loss recovery techniques map onto HTTP and proposes a few techniques (some more backwards compatible than others) for improving things. Its a large collaborative effort from Google and USC. Go read it.

The most important piece of information is that TCP flows that experienced a loss took 5 times longer to complete than flows that did not experience a loss. 10% of flows fell into the "some loss" bucket. For good analysis go read the paper, but the shorthand headline is that this is because the short "mice" flows of HTTP/1 tend to be comprised mostly of tail packets and traditionally tail loss is only recovered using horrible timeout based strategies.

This is the best summary of the problem that I've seen - but its been understood for quite a while. It is one of the motivators of HTTP/2 and is a theme underlying several of the blog posts google has made about QUIC.

The "aggression" concept used in the paper is essentially the inclusion of redundant data in the TCP flow under some circumstances. This can, at the cost of extra bandwidth, bulk up mice flows so that their tail losses are more likely to be recovered with single RTT mechanisms such as SACK based fast recovery instead of timeouts. Conceivably this extra data could also be treated as an error correction coding which would allow some losses to be recovered independent of the RTT.

Internet Society Briefing Panel @ IETF 87

2013-08-03T12:33:00.000-04:00

This past week saw IETF 87, in Berlin Germany, come and go. As usual I met a number of interesting new folks, caught up with old acquaintances I respect (a number of whom now work for Google - it seems to happen when you're not looking at them :)), and got some new ideas into my head (or refined old ones).

On Tuesday I was invited to be part of a panel for the ISOC lunch with Stuart Cheshire and Jason Livingood. We were lead and moderated by the imitable Leslie Daigle.

The panel's topic was "Improving the Internet Experience, All Together Now." I made the case that the standards community needs to think a little less about layers and a little more about the end goal - and by doing so we can provide more efficient building blocks. Otherwise developers find them selves tempted to give up essential properties like security and congestion control in order to build more responsive applications. But we don't need to make that tradeoff.

The audio is available here http://www.internetsociety.org/internet-society-briefing-panel-ietf-87 .. unfortunately the audio is full of intolerable squelch until just after the 22:00 mark (I'm the one speaking at that point).. the good news is most of that is "hold music", introductions, and a recap of the panel topic that is also described on the web page.

RTT - a lot more than time to first byte

2013-06-06T17:19:00.001-04:00

Lots of interesting stuff in this paper: Understanding the latency benefits of multi-cloud webservice deployments.

One of the side angles I find interesting is that they find that even if you built an idealized fully replicated custom CDN using multiple datacenters on multiple clouds there are still huge swaths of Internet population that have browser to server RTTs over 100ms. (China, India, Argentina, Israel!)

RTT is the hidden driver in so much of internet performance and to whatever extent possible it needs to be quashed. Its so much more than time to first byte.

Bandwidth, CPU, even our ability to milk more data out of radio spectrum, all improve much more rapidly than RTT can.

Making RTT better is really a physical problem and is, in the general case at least, really really hard. Making things less reliant on RTT is a software problem and is a more normal kind of hard. Or at least it seems if you're a software developer.

RTT is not just time to the first byte of a web request. It's DNS. Its the TCP handshake. Its the SSL handshake. Its the time it takes to do a CORS check before an XHR. Its the time it takes to grow the congestion window (i.e. how fast you speed up from slow start) or confirm revocation status of the SSL certificate. Its the time it takes to do an HTTP redirect. And it is the time you're stalled while you recover from a packet loss. They're all some factor of RTT.

HTTP/2 and SPDY take the first step of dealing with this by creating a prioritized and muxxed protocol which tends to make it more often bandwidth bound than HTTP/1 is. But even that protocol still has lots of hidden RTT scaling algorithms in it.

If a tree falls in a forest and nobody is there to hear it..

2013-05-28T10:54:00.002-04:00

Robert Koch's Masters Thesis about websockets security and adoption is linked.

One interesting tidbit - he surveyed 15K free apps from the google play store and saw only 14 that use websockets. Previous surveys of traditional websites have seen similarly poor uptake.

WS certainly isn't optimal, but it has a few pretty interesting properties and its sad to see it underused. I suspect its reliance on a different server and infrastructure stack is part of the reason.

Firefox Nightly on WebPageTest

2013-03-07T16:39:00.002-05:00

Thanks to Google's amazing Pat Meenan, Firefox nightly has been added to webpagetest.org - just make sure to select the Dulles VA location to be able to use it.

He shared the news on twitter. Thanks Pat!

WPT is a gold standard tool and to be able to see it reflect (and validate/repudiate) changes in nightly is really terrific addition.

On Induced Latency and Loss

2013-01-14T17:21:00.000-05:00

The estimable Mark Allman has a new paper out in the latest ACM CCR dealing with data collection around buffering in the Internet. If you respect a good characterization paper as much as I do - you should read it and a couple related mailing list threads here and here. The paper is only 7 pages; go read it -- this blog post will wait for you.

The paper's title, Comments on Bufferbloat, in my mind undersells the contributions of the paper. Colloquially I consider bufferbloat to refer specifically to deeply filled queues at or below the IP layer that create latency problems for real-time applications using different transport layer flows. (e.g. FTP/Torrent/WebBrowser creating problems for perhaps unrelated VOIP). But the paper provides valuable data on real world levels of induced latency where it has implications for TCP Initial Window (IW) sending sizes and the web-browser centric concern of managing connection parallelism in a world with diverse buffer sizes and IWs. That's why I'm writing about it here.

Servers inducing large latency and loss through parallelized sending is right now a bit of a corner case problem in the web space that seems to be growing. My chrome counterpart Will Chan talks a bit about it too over here.

For what its worth, I'm still looking for some common amounts of network buffering to use as configurations in simulations on the downstream path. The famous Netalyzer paper has some numbers on upstream. Let me know if you can help.

The traffic Mark looked at was not all web traffic (undoubtedly lots of P2P), but I think there are some interesting take aways for the web space. These are my interpretations - don't let me put words in the author's mouth for you when the paper is linked above:

There is commonly some induced delay due to buffering.. the data in the paper shows residential rtts of 78ms typically bloated to ~120ms. That's not going to mess with real time too badly, but its an interesting data point.
The data set shows bufferbloat existence at the far end of the distribution, At the 99th percentile of round trips on residential connections you see over 900ms of induced latency.
The data set shows that bufferbloat is not a constant condition - the amount of induced latency is commonly changing (for better and for worse) and most of the time doesn't fluctuate into dangerous ranges.
There are real questions of whether other data sets would show the same thing, and even if they did it isn't clear to me what the acceptable frequency of these blips would be to a realtime app like VOIP or RTC-Web.
IW 10 seems reasonably safe from the data in Mark's paper, but all of his characterizations don't necessarily match what we see in the web space. In particular the size of our flows are not as often less than 3 packets in size as they are in the paper's data (which is not all web traffic). There are clearly deployed corner cases on the web where server's send way too much data (typically through sharding) and induce train wrecks on the web transfer. How do we deal with that gracefully in a browser without sharding or IW=10 for hosts that use them in a more coordinated way? That's a big question for me right now.

[ Comments on this are best done at https://plus.google.com/100166083286297802191/posts/6XB59oaQzDL]

Managing Bandwidth Priorities In Userspace via TCP RWIN Manipulation

2012-12-08T23:32:00.005-05:00

This post is a mixture of research and speculation. The speculative thinking is about how to manage priorities among a large number of parallel connections. The research is a survey of how different OS versions handle attempts by userspace to actively manage the SO_RCVBUF resource.

Let's consider the case where a receiver needs to manage multiple incoming TCP streams and the aggregate sending windows exceed the link capacity. How do they share the link?

There are lots of corner cases but it basically falls into 2 categories. Longer term flows try and balance themselves more or less evenly with each other, and new flows (or flows that have been quiescent for a while) blindly inject data into the link based on their Initial Window configuration (3, 10, or even 40 [yes, 40] packets).

This scenario is a pretty common one for a web browser on a site that does lots of sharding.. It is common these days to see 75, 100, or even 200 parallel connections used to download a multi megabyte page. Those different connections have an internal priority to the web browser, but that information is more or less lost when HTTP and TCP take over - the streams compete with each other blind to the facts of what they carry or how large that content is. If you have 150 new streams sending IW=10 that means 1500 packets can be sent more or less simultaneously even if it is of a low priority. That kind of data is going to dominate most links and cause losses on others. Of course, unknown to the browser, those streams might only have 1 packet worth of data to send (small icons, or 304's) - it would be a shame to dismiss parallelism in those cases out of fear of sending too fast. Capping parallelism at a lower number is a crude approach that will hurt many use cases and is very hard to tune in any event.

The receiver does have one knob to influence the relative weights of the incoming streams: the TCP Receive Window (rwin). The sender generally determines the rate a stream can transfer at (via the CWND calculation), but that rate is limited to no more than rwin allows.

I'm sure you see where this is going - if the browser wants to squelch a particular stream (because the others are more important) it can reduce the rwin of the stream to below the Bandwidth Delay Product of the link - effectively slowing just that one stream. Viola - crude priorities for the HTTP/TCP connections! (RTT is going to matter a lot at this point for how fast they run - but if this is just a problem about sharding then RTTs should generally be similar amongst the flows so you can at least get their relative priorities straight).

setsockopt(SO_RCVBUF) is normally how userspace manipulates the buffers associated with the receive window. I set out to survey common OS platforms to see how they handled dynamic tuning of that parameter in order to manipulate the receiver window.

WINDOWS 7
Win7 does the best job; it allows SO_RCVBUF to both dynamically increase and decrease rwin (Decreases require the WsaReceiveBuffering socket ioctl ). Increasing a window is instantaneous and window update is generated for the peer right away. However when decreasing a window (and this is true of all platforms that allow decreasing) the window is not slammed shut - it naturally degrades with data flow and is simply not recharged as the application consumes the data which results in the smaller window.

For instance a stream has an rwin of 64KB and decreases it to 2KB. A window update is not sent on the wire even if the connection is quiescent.. Only after the 62KB of data has been sent the window shrinks naturally to 2KB even if the application has consumed the data from the kernel - future reads will recharge rwin back to a max of 2KB. But this process is not accelerated in any way by the decrease of SO_RCVBUF no matter how long it takes. Of course there would always be 1RTT of time between the decrease of SO_RCVBUF and the time it takes effect (where the sender could send larger amounts of data) but by not "slamming" the value down with a window update (which I'm not even certain would be tcp compliant) that period of time is extended from 1RTT to indefinite. Indeed, the application doesn't need SO_RCVBUF at all to achieve this limited definition of decreasing the window - it can simply stop consuming data from the kernel (or it can pretend to do so by using MSG_PEEK) and that would be no more or less effective.

If we're trying to manage a lot of parallel busy streams this strategy for decreasing the window will work ok - but if we're trying to protect against a new/quiescent stream suddenly injecting a large amount of data it isn't very helpful. And that's really the use case I have in mind.

The other thing to note about windows 7 is that the window scale option (which effectively controls how large the window can get and is not adjustable after the handshake) is set to the smallest possible value for the SO_RCVBUF set before the connect. If we agree that starting with a large window on a squelched stream is problematic because decreases don't take effect quickly enough, that implies the squelched stream needs to start with a small window. Small windows will not need window scaling. This isn't a problem for the initial squelched state - but if we want to free the stream up to run at a higher speed (perhaps because it now has the highest relative priority of active streams after some of the old high priority ones completed) the maximum rwin is going to be 64KB - going higher than that requires window scaling. a 64KB window can support significant data transfer (e.g. 5 megabit/sec at 100ms rtt) but certainly doesn't cover all the use cases of today's Internet.

WINDOWS VISTA
When experimenting with Vista I found behavior similar to Windows 7. The only difference I noted was that it always used a window scale in the handshake even if initially a rwin < 64KB was going to be used. This allows connections with small initial windows to be raised to greater sizes than is possible on windows 7 - which for the purposes of this scheme is a point in vista's favor.

I speculate that the change was made in windows 7 to increase interoperability - window scale is sometimes an interop problem with old nats and firewalls and the OS clearly doesn't anticipate the rwin range to be actively manipulated while the connection is established.. therefore if window scale isn't needed for the initial window size you might as well omit it out of an abundance of caution.

WINDOWS XP
By default XP does not do window scaling at all - limiting us to 64KB windows and therefore multiple connections are required to use all the bandwidth found in many homes these days. It doesn't allow shrinking rwin at all (but as we saw above the vista/win-7 approach to shrinking the window isn't more useful than one that can be emulated through non-SO_RCVBUF approaches).

XP does allow raising the rwin. So a squelched connection could be setup with a low initial window and then raised up to 64KB when its relative ranking improved. The sticky wicket here is that it appears that attempts to set SO_RCVFBUF below 16KB don't work. 16KB maps to a new connection with IW=10 - having a large set of new squlched connections all with the capacity to send 10 packets probably doesn't meet the threshold of squelched.

OS X (10.5)
The latest version of OS X I have handy is dated. Nothing in a google search leads me to believe this has changed since 10.5, but I'm happy to take updates.

OS X, like XP, does not allow decreasing a window through SO_RCVBUF.

It does allow increasing one if the window size was set before the connection - otherwise "auto tuning" is used and cannot be manipulated while the connection is established.

Like vista, the initial window determines the scaling factor, and assuming a small window on a squelched stream that means window scaling is disabled and the window can only be increased to 64KB for the life of the connection.

LINUX/ANDROID
Linux can decrease rwin, but as with other platforms requires data transfer to do it instead of a 1-RTT slamming shut of the window. Linux does not allow increasing the window past the size it has when the connection is established. So you can start a window large, slowly decrease it, and then increase it back to where it started.. but you can't start a window small and then increase it as you might want to do with a squelched stream.

It pains me to say this, and it is so rarely true, but this makes Linux the least attractive development platform for this approach.

CONCLUSIONS
From this data it seems viable to attempt a strategy for {windows >= vista and OS X} that mixes 3 types of connections:

Full Autotuned connections. These are unsquelched, can never be slowed down, generally use window scaling and are capable of running at high speeds
Connections that begin with small windows and are currently squelched to limit the impact of new low priority streams in highly parallel environments
Connections that were previously squelched but have now been upgraded to 64KB windows.. "almost full" (1) if you will.

Smarter Network Page Load for Firefox

2012-12-05T10:47:00.000-05:00

I just landed some interesting code for bug 792438. It should be in the December 6th nightly, and If it sticks it will be part of Firefox 20.

The new feature keeps the network clear of media transfers while a page's basic html ,css, and js are loaded. This results in a faster first paint as the elements that block initial rendering are loaded without competition. Total page load is potentially slightly regressed, but the responsiveness more than makes up for it. Similar algorithms report first paint improvements of 30% with pageload regressions around 5%. There are some cases where the effect is a lot more obvious than 30% - try pinterest.com, 163.com, www.skyrock.com, or www.tmz.com - all of these sites have large amounts of highly parallel media on the page that, in previous versions, competed with the required css.

Any claim of benefit is going to depend on bandwidth, latency, and the contents of your cache (e.g. if you've already got the js and css cached, this is moot.. or if you have a localhost connection like talos tp5 uses it is also moot because bandwidth and latency are essentially ideal there).

Before I get to the nitty gritty I think its worth a paragraph of Inside-Mozilla-Baseball to mention what a fun project this was. I say that in particular because it involved a lot of cross team participation on many different aspects (questions, advice, data, code in multiple modules, and reviews). I think we can all agree that when many different people are involved on the same effort efficiency is typically the first casualty. Perhaps this project is the exception needed to prove the rule - it went from a poorly understood use case to commited code very quickly. Ehsan, Boris, Dave Mandelin, Josh Aas, Taras, Henri, Honza Bambas, Joe Drew, and Chromium's Will Chan helped one and all - its not too often you get the rush of everyone rowing in sync; but it happened here and was awesome to behold and good for the web.

In some ways this feature is not intuitive. Web performance is generally improved by adding parallelization due to large amounts of unused bandwidth left over by the single session HTTP/1 model. In this case we are actually reducing parallelization which you wouldn't think would be a good thing. Indeed that is why it can regress total page load time for some use cases. However, parallelization is a double edged sword.

When parallelization works well it is because it helps cover up moments of idle bandwidth in a single HTTP transaction (e.g. latency in the handshake, latency during the header phase, or even latency pauses involved in growing the TCP congestion window) and this is critically important to overall performance. Servers engage in hostname sharding just to opt into dozens of parallel connections for performance reasons.

On the other hand, when it works poorly parallelization kills you in a couple of different ways. The more spectacular failure mode I'll be talking about in a different post (bug 813715) but briefly the over subscription of the link creates induced queues and TCP losses that interact badly with TCP congestion control and recovery. But the issue at hand here is more pedestrian - if you share a connection 50 ways and all 50 sessions are busy then each one gets just 2% of the bandwidth. They are inherently fair with each other and without priority even though that doesn't reflect their importance to your page.

If the required js and css is only getting 10% of the bandwidth while the images are getting 90% then the first meaningful paint is woefully delayed. The reason you do the parallelism at all is because many of those connections will be going through one of the aforementioned idle moments and aren't all simultaneously busy - so its a reasonable strategy as long as maximizing total bandwidth utilization is your goal.. but in the case of an HTML page load some resources are more important than others and it isn't worth sacrificing that ordering to perfectly optimize total page load. So this patch essentially breaks page load into two phases to sort out that problem.

The basic approach here is the same as used by Webkit's ResourceLoadScheduler. So Webkit browsers already do the basic idea (and have validated it). I decided we wanted to do this at the network layer instead of in content or layout to enable a couple extra bonuses that Webkit can't give because it operates on a higher layer:

If we know apriori that a piece of work is highly sensitive to latency but is very low in bandwidth then we can avoid holding it back even if that work is just part of an HTTP transaction. As part of this patchset I have included the ability to preconnect a small quota of 6 media TCP sessions at a time while css/js is loading. More than 6 can complete, its just limited to 6 outstanding at one instant to bound the bandwidth consumption. This results in a nice little hot pool of connections all ready for use by the image transfers when they are cleared for takeoff. You could imagine this being slightly expanded to a small number of parallel HTTP media transfers that were bandwidth throttled in the future.
The decision on holding back can be based on whether or not SPDY is in use if you wait until you have a connection to make the decision - spdy is a prioritized and muxed-on-one-tcp-session protocol that doesn't need to use this kind of workaround to do the right thing. In its case we should just send the requests as soon as possible with appropriate priorities attached to them and let the server do the right thing. The best of both worlds!

This issue illustrates again how dependent HTTP/1 is on TCP behavior and how that is a shortcoming of the protocol. Reasonable performance demands parallelism, but how much is dependent on the semantics of the content, the conditions of the network, and the size of the content. These things are essentially unknowable and even characterizations of the network and typical content change very quickly. Its essential for the health of the Internet that we migrate away from this model onto HTTP/2.

Comments over here please: https://plus.google.com/100166083286297802191

A brief note on pipelines for firefox

2012-11-14T10:47:00.001-05:00

A brief note on http pipeline status - you may have read on mozilla planet yesterday that Chrome has this technology enabled. That's not true, I can't say why the author wrote it - but chrome has the same status (implemented but configured off by default) as firefox and largely for the same reasons.

This is a painful note for me to write, because I'm a fan of HTTP pipelines.

But, despite some obvious wins with pipelining it remains disabled as a high risk / high maintenance item. I use it and test with it every day with success, but much of the risk in tied up in a user's specific topology - intermediaries (including virus checkers - an oft ignored but very common intermediary) are generally the problem with 100% interop. I encourage readers of planet moz to test it too - but that's a world apart turning it on by default- the test matrix of topologies is just very different.

See bugzilla 716840 and 791545 for some current problems caused by unknown forces.

Also see telemetry for TRANSACTION_WAIT_TIME_HTTP and TRANSACTON_WAIT_TIME_HTTP_PIPELINES - you'll see that pipelines do marginally reduce queuing time, but not by a heck of a lot in practice. (~65% of transactions are sent within 50ms using straight HTTP, ~75% with pipelining enabled). If we don't see more potential than that, it isn't worth taking the pain of interop troubles. It is certainly possible that those numbers can be improved.

The real road forward though is HTTP/2 - a fully multiplexed and pipelined protocol without all the caveats and interop problems. Its better to invest in that. Check out TRANSACTON_WAIT_TIME_SPDY and you'll see that 93% of all transactions wait less than 1ms in the queue!

The Road to HTTP/2

2012-08-06T09:25:00.000-04:00

I have been working on HTTP/2.0 standardization efforts in the IETF. Lots has been going on lately, lots of things are still in flux, and many things haven't even been started yet. I thought an update for Planet Mozilla might be a useful complement to what you will read about it in other parts of the Internet. The bar for changing a deeply entrenched architecture like HTTP/1 is very tall - but the time has come to make that change.

Motivation For Change - More than page load time

HTTP has been a longtime abuser of TCP and the Internet. It uses mountains of different TCP sessions that are often only a few packets long. This triggers lots of overhead and results in common stalling due to bad interaction with the way TCP was envisioned to be deployed. The classic TCP model pits very large FTP flows against keystrokes of a telnet session - HTTP doesn't map to either of those very well. The situation is so bad that over 2/3rds of all TCP packet losses are repaired via the slowest possible mechanism (timer expiration), and more than 1 in 20 transactions experience a loss event. That's a sign that TCP interactions are the bottleneck in web scalability.

Indeed, after a certain modest amount of bandwidth is used additional bandwidth barely moves the needle on HTTP performance at all - the only thing that matters is connection latency. Latency isn't getting better - if anything it is getting worse with the transition to more and more mobile networks. This is the quagmire we are in - we can keep adding more bandwidth, clearer radios, and faster processors in everyone's pocket as we've been doing but that won't help much any more.

These problems have all been understood for almost 20 years and many parties have tried to address them over time. However, "good enough" has generally carried the day due to legitimate concerns over backward compatibility and the presence of other lower hanging fruit. We've been in the era of transport protocol stagnation for a while now. Only recently has the problem been severe enough to see real movement on a large scale in this space with the deployment of SPDY by google.com, Chrome, Firefox, Twitter, and F5 among others. Facebook has indicated they will be joining the party soon as well along with nginx. Many smaller sites also participate in the effort, and Opera has another browser implementation available for preview.

SPDY uses a set of time proven techniques. Those are primarily multiplexed transactions on the same socket, some compression, prioritization, and good integration with TLS. The results are strong: page load times are improved, TCP interaction improves (including less connections to manage less dependency on rtt as a scaling factor, and improved "buffer bloat" properties), and incentives are created to give users the more secure browsing experience they deserve.

Next Steps

Based on that experience, the IETF HTTP-bis working group has reached consensus to start working on HTTP/2.0 based on the SPDY work of Mike Belshe and Roberto Peon. Mike and Roberto initially did this work at Google, Mike has since left Google and is now the founder of Twist. While the standards effort will be SPDY based, that does not mean it will be backwards compatible (it almost certainly won't be) nor does it mean it will have the exact same feature set but to be a success the basic properties will persevere and we'll end up with a faster, more scalable, and more secure Web.

Effectively taking this work into an open standardization forum means ceding control of the protocol revision process to the IETF and agreeing to implement the final result as long as it doesn't get screwed up too badly. That is a best effort process and we'll just have to participate and then wait to see what becomes of it. I am optimistic - all the right players are in the room and want to do the right things. Just as we have implemented SPDY as an experiment in the past in order to get data to inform its evolution, you can expect Mozilla to vigorously fight for the people of the web to have the best protocol possible and it seems likely we will experiment with implementations of some Internet-draft level revisions of HTTP/2.0 along the way to validate those ideas before a full standard is reached. (We'll also be deprecating all of these interim versions along the way as they are replaced with newer ideas - we don't want to create defacto protocols by mistake.) The period of stagnant protocols is ending.

Servers should come along too - use (and update) mod_spdy and nginx with spdy support; or get hosting from an integrated spdy services provider like CloudFlare or Akamai.

Exciting times.

Making Firefox Search Snappier

2012-04-25T09:56:00.000-04:00

The Firefox 15 development window just opened and I checked into inbound a cool feature that had been sitting in my queue for a little while. Let's see if it sticks!

The feature basically lets non-network code hint to the networking layer that it will probably send a http transaction to a particular site soon, but it isn't ready to do so yet. The network code can take the hint and begins to setup a TCP and (if necessary) SSL handshake right away because these are high latency operations.

The primary initial user of this is the search box in firefox. When you focus on that box we will probably make a connection to the search provider right away. Simultaneously with this you type your search terms - when your search is ready (or partly ready if you are using search suggestions) it can be submitted to the search service without any waiting for network setup.

This can be a significant win. The average Internet round trip time is about 100ms (there is a lot of variation in this). It takes 1 RTT to setup TCP and (likely) 2 more for SSL. 300ms is a notable pause, but overlapped in the background it essentially becomes free resulting in snappier searches that still use a secured transport. If you're using a SPDY enabled search provider such as google or twitter, this is done over SPDY, so the one TCP session now established will be able to carry all of your search results - no more setup overhead to worry about with additional parallel connections etc..

The other user of this feature that got checked in as part of this merge is actually internal to the networking code just before the cache I/O is done. The amount of time it takes to check the disk cache is extremely variable - afaict generally its pretty fast but the tail can be really awful depending on hardware, other system activity, OS, etc.. So we overlap the network handshakes with this activity that figures out the values of the If-Modified-Since (etc..) request headers.

There is an IDL for providing the hint (nsISpeculativeConnect) - so if you can think of other areas ripe for this kind of optimization let's get to it!

[The best place for comments is probably here: https://plus.google.com/100166083286297802191/posts ]

Welcome Jiten - Building a Networking Dashboard

2012-04-23T22:47:00.000-04:00

There are many awesome things about the Firefox networking layer. However realtime visibility into what its doing is not on that list.

Thanks to Google Summer of Code, Jiten Thakkar is going to work on that problem this summer. Jiten is already a card carrying Mozillian with code commits in several parts of Gecko and I'm excited to have him focused on necko and an add-on for the next few months.

We hope to learn what connections are being made, how fast they are running, what the DNS cache looks like, how often SPDY is used, and all manner of other information that will aid debugging and inform optimization choices. Maybe even icing on the cake ssuch as in-browser diagnostic tools (e.g. Can I tcp handshake with an arbitrary hostname? How about HTTP? How about SSL?) to round things out.

Good Stuff!

Twitter, SPDY, and Firefox

2012-03-08T10:39:00.000-05:00

Well - look at what this morning brings: some Twitter.com enabled SPDY goodness in my firefox nightly! (below)

spdy is currently enabled in firefox nightly and you can turn it on in FF 11 and FF 12 through network.http.spdy.enabled in about:config

There is not yet any spdy support for the images on the Akamai CDN that twitter uses, and that's obviously a big part of performance. But still real deployed users of this are Twitter, Google Web, Firefox, Chrome, Silk, node etc.. this really has momentum because it solves the right problems.

Big pieces still left are a popular CDN, open standardization, a http<>spdy gateway like nginx, a stable big standalone server like apache, and support in a load balancing appliance like F5 or citrix. And the wind is blowing the right way on all of those things. This is happening very fast.

https://twitter.com/account/available_features

GET /account/available_features HTTP/1.1
Host: twitter.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120308 Firefox/13.0a1
[..]
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/
[..]

HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
Content-Length: 3929
Content-Type: text/javascript; charset=utf-8
Date: Thu, 08 Mar 2012 15:20:56 GMT
Etag: "8f2ef94f3149553a2c68e98a8df04425"
Expires: Tue, 31 Mar 1981 05:00:00 GMT
Last-Modified: Thu, 08 Mar 2012 15:20:56 GMT
Pragma: no-cache
Server: tfe
[..]
x-revision: DEV
x-runtime: 0.01763
[..]
x-xss-protection: 1; mode=block
X-Firefox-Spdy: 1