Bits Up!: recommendations

Showing posts with label recommendations. Show all posts

Tuesday, March 4, 2008

Calgary IOMMU - At What Price?

The Calgary IOMMU is a feature of most IBM X-Series (i.e. X86_64) blades and motherboards. If you aren't familiar with an IOMMU, it is strongly analogous to a regular MMU but applied to a DMA context. Their original primary use was for integrating 32 bit hardware into 64 bit systems. But another promising use for them is enforcing safe access in the same way an MMU can.

In normal userspace, if a program goes off the rails and accesses some memory it does not have permissions for a simple exception can be raised. This keeps the carnage restricted to the application that made the mistake. But if a hardware device does the same thing through DMA, whole system disaster is sure to follow as nothing prevents the accesses from happening. The IOMMU can provide that safety.

An IOMMU unit lets the kernel setup mappings much like normal memory page tables. Normal RAM mappings are cached with TLB entries, and IOMMU maps are cached TCE entries that play largely the same role.

By now, I've probably succeeded in rehashing what you already knew. At least it was just three paragraphs (well, now four).

The pertinent bit from a characterization standpoint is a paper from the 2007 Ottawa Linux Symposium. In The Price of Safety: Evaluating IOMMU Performance Muli Ben-Yehuda of IBM and some co-authors from Intel and AMD do some measurements using the Calgary IOMMU, as well as the DART (which generally comes on Power based planers).

I love measurements! And it takes guts to post measurements like this - in its current incarnation on Linux the cost of safety from the IOMMU is a 30% to 60% increase in CPU! Gah!

Some drill down is required, and it turns out this is among the worst cases to measure. But still - 30 to 60%! The paper is short and well written, you should read it for yourself - but I will summarize the test more or less as "measure the CPU utilization while doing 1 Gbps of netperf network traffic - measure with and without iommu". The tests are also done with and without Xen, as IOMMU techniques are especially interesting to virtualization, but the basic takeaways are the same in virtualized or bare metal environments.

The "Why so Bad" conclusion is management of the TCE. The IOMMU, unlike the TLB cache of an MMU, only allows software to remove entries via a "flush it all" instruction. I have certainly measured that when TLBs need to be cleared during process switching that can be a very measurable event on overall system performance - it is one reason while an application broken into N threads runs faster than the same application broken into N processes.

But overall, this is actually an encouraging conclusion - hardware may certainly evolve to give more granular access to the TCE tables. And there are games that can be played on the management side in software that can reduce the number of flushes in return for giving up some of the safety guarantees.

Something to be watched.

Thursday, February 28, 2008

Appliances and Hybrids

In my last post, which was about the fab book Network Algorithmics, I mentioned intelligent in-network appliances. I also mentioned that "the world doesn't need to be all hardware or all software".

I believe this to be an essential point - blending rapidly improving commodity hardware with a just a touch of custom ASIC/FPGA components, glued together by a low level architecture aware systems programming approach makes unbelievably good products that can be made an produced relatively inexpensively.

These appliances and hardware are rapidly showing up everywhere - Tony Bourke of Load Balancing digest makes a similar (why choose between ASIC or Software?) post today. Read it.

Wednesday, February 27, 2008

Network Algorithmics

In late 2004 George Varghese published an amazing book on the design and implementation of networking projects: Network Algorithmics

You won't find the boilerplate "this is IP, this is TCP, this is layer blah" crapola here. What makes this special is that it describes the way the industry really builds high end switches, routers, high end server software and an ever growing array of intelligent in-network appliances. The way that is really done, is very different than presented in a classic networking text book.

The material is not algorithms, rather it is a way of looking at things and breaking down abstraction barriers through techniques like zero copy, tag hinting, lazy evaluation, etc.. It makes the point that layers are how you describe protocols - not how you want to implement them. The world doesn't have to be all hardware or all software, this helps train your mind on how to write systems that harmonize them both by taking the time to really understand the architecture.

To me, this book is a fundamental bookshelf item. You don't hear it often mentioned with the likes of Stevens, Tanenbaum, and Cormen - but amongst folks in the know, it is always there. More folks ought to know.

Thursday, January 31, 2008

AjaxScope Paper Provides Javascript Characterization

Emre Kıcıman and Benjamin Livshits from Microsoft Research present some interesting data in their SOPS paper - AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications.

The paper is mostly about AjaxScope, a neat insturmentation and profiling tool for JavaScript.

What I want to highlight, though, are some of the measurements they present for both IE (6 and 7) as well as Firefox (2). This is real data in a refereed journal of the ACM, not a Gartner-style whitepaper.

Among the interesting nuggets are IE's 35x slower performance in String cat operations, and Firefox's 4x slower Array join execution time. The authors also put the intrinsics into context by measuring the performance of common portal pages - IE beats Firefox on msn.com, but Firefox turns the tables on Yahoo!.

Lots more interesting data, and a useful tool, in the paper. Read it.