Monday, November 25, 2013

A golden rule of performance optimization

Of course, one shouldn't optimize what doesn't need optimization, so one should use profiler to see what's slow. At least YourKit allows for two kinds of performance profiling: sampling and tracing. The first one reports which code takes most time, and the second one also includes invocation counts for each method. And the rule is:

One should look at both sampling profiles and the ones with invocation counts.

Sampling profiles show you where a program is slow. Tracing profiles show why it's slow. It's quite often that I see some innocent things in the snapshots, like getting a value from a hash table. This doesn't mean that I should optimize hash table, it means that it's called zillions of times. So I should step back several stack frames and look carefully at the algorithm, trying to decrease its time complexity. Invocation count information helps to find the exact abstraction level I should pay attention to.

Sadly one can't use only tracing profiles, even if they also report the time each method takes. It's like quantum mechanics: the tracing overhead affects the time reported too significantly, and one can't trust it anymore.

In some cases it's obvious where the algorithm is inefficient, and sampling snapshots are enough. But increasingly often I find it to be not the case.

Yes, it's all well-known. And it's been about 15 years ago that I first read (and agreed) that algorithm optimization gives much better results than low-level one. But to learn something and to realize it, understand it deeply, are completely different things.