- It hides network idleness during TCP Handshakes though persistent connections help with this too.
- It hides network idleness during the first byte phase transactions, though pipelining can address this too.
- It hides network idleness during TCP slow start wait-for-ack periods. This is a big one.
- It provides a mechanism to prioritize and avoid head of line blocking problems.
- It steals bandwidth from competing "tcp friendly" flows by simply increasing the number of flows in one application. That's an arms race that most people think should be avoided.
Parallelism hurts when
- It increases the number of TCP Handshakes which are both slow and CPU intensive (at least compared to regular data packets) to execute - this assumes persistent connections are an alternative.
- It increases the overhead of normal data processing because more flows have to be considered typically via longer hash chains
- It increases the impact of memory overhead and processor cache pollution by increasing the number of simultaneous TCP control blocks that have to managed on both the client and the server.
- The resulting reduced amount of data per flow makes it harder to fully open sender congestion windows.
- Packet loss is increased due to the non correlated fluctuations of data to be sent between the parallel connections. Two competing flows that are both sending from infinite data sources will quickly adapt to share the bandwidth, but two flows that have a fluctuating demand (e.g. parallel persistent HTTP connections that periodically go idle and alive) will inherently have patterns of underutilizing and overutilizing the path. Overutilization results in either packet loss or excess buffering in the network, which leads to poor interactive response times.
- when I don't have an idle connection and I need the answer with minimum latency
- when I expect existing connections are experiencing idleness and therefore not using all of the available bandwidth
I wonder if we can't do better in Firefox.. First, lets deal with the case of a low latency request. Right now all we do with them is to put them at the top of the waiting queue if the request cannot be dispatched immediately (because the limit of 6 has already been reached). But there are really two cases to consider:
- What to do when the network is not already saturated
- What to do when the network is saturated
In the case of a saturated network some of the existing parallel connections should be stalled while the low latency request is satisfied in order to provide the most bandwidth for that important transaction. We can do this by temporarily slamming their recv windows to something close to 1 packet of data which will slow them down to a trickle. This can be done commensurately with the transmission of the prioritized request as it should take 1/2 RTT for the window change to reach the sender.
But what about the more common case where all transactions are of equal priority - how do we make the decision then about opening a new connection vs queueing a new transaction? Assuming we aren't concerned about head of line blocking issues (which we should be able to wrap up in a definition of priorty somehow), then we want to do this only when there is network idleness that can be covered up by parallelism. This approach is radically different than "open up to N" connections.
It isn't obvious exactly how to determine that in Necko. But then again, you are looking for data bursts followed by idleness - and its pretty obvious when you see it graphed out. This is the transfer pattern of a single http response I looked at a couple of weeks ago - it could happily overlap with another flow in order to more effectively utilize the whole pipe. (of course, if the server used a larger initial CWND, the problem would be massively reduced.)