- A lot of my mail is spam. Much of the spam comes from botnet owned hosts on consumer internet connections. Because of this I am not so much measuring latency from my edge to core Internet services, but instead to other edges. If this is true, it actually has some interesting peer to peer insights.
- Because we are likely dealing with "owned" botnets sending the spam, those hosts are distributed more uniformly across the world than the services I actually choose to use day to day which exhibit greater locality to my part of the world. Therefore I am getting real data, but maybe not data that is especially insightful to my day to day network usage.
- My results may not be reflective of general edge connectivity, I might just have lousy service.
- The handshake latency should be dominated by the network, but the application at the other end plays a role too. Owned spam generators may not be running up to commerical grade mail client standards generally expected of Internet infrastructure.
I have now built a packet trace that covers 6 days and about 130,000 DNS request/response pairs. I was astonished there were so many in less than a week from a home LAN. The trace was taken upstream from my home LAN caching recursive resolver - so the redundancy was removed from the data where TTL based caching could do so. The 130,000 transactions were done across 8854 different servers - this was also an astonishing amount of diversity. It comes down to just 14 per server on average, I would have expected a lot more server reuse.
The results are indeed better than the SMTP handshakes. We are now dealing with infrastructure class servers and it shows. But frankly, the latency numbers are still surprisingly high - 41% of all lookups still take a very noticeable 100ms. Remember, also, that starting many webpages requires at least two uncached lookups (one from the root name servers and one from the zone's name server itself) - that can be a really long lag.
Here are the numbers, ranging from a best of 24ms, to a worst of 17 minutes. That latter number is certainly an outlier. 99+% of all transactions were complete in 401ms or less.
percentile latency (ms)
worst 1,039,356 (17 minutes!)
Any lookup that did not complete was not included in the dataset. That 17 minute one is fascinating - it is a genuine reply to the original lookup request of kvack.org (probably generated by the spam filtering software who received a legitimate message from someone @kvack.org but was seeing if the name resolved as part of its spam scoring system) - it was not a reply to a client generated retransmission or anything like that. That request must have been buried in quite a queue somewhere! It is hard to imagine that the DNS client hadn't timed out the transaction by the time the response arrived, but the packet trace does not give any insight into that. These very long transactions are exceptionally rare - only 46 of the 130,000 transactions took more than 3 seconds.
- As you see, the median is 77ms
- The mean is 112ms (104 removing the 17 minute outlier from the data)
- 41 percent of all transactions took over 100ms
What does it Mean?
I can draw some weak conclusions:
- DNS latency is much better than TCP handshake latency on my mail server - indeed almost twice as good. It seems likely that is because DNS is dealing with infrastructure class servers (both in terms of location and function), whereas much of the email traffic was probably botnet generated spam out at the edges of the network - just like my host. So much for net neutrality eh, there are already multiple tiers of service in full effect!
- Latency still sucks. 100ms round trips are deadly and common.