This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Friday, March 18 • 09:30 - 10:10
Towards ameliorating measurement bias in evaluating performance of generate code

Sign up or log in to save this to your schedule and see who's attending!

To make sure LLVM continues to optimize code well, we use both post-commit performance tracking and pre-commit evaluation of new optimization patches. As compiler writers, we wish that the performance of code generated could be characterized by a single number, making it straightforward to decide from an experiment whether code generation is better or worse. Unfortunately, performance of generated code needs to be characterized as a distribution, since effects not completely under control of the compiler, such as heap, stack and code layout or initial state in the processors prediction tables, have a potentially large influence on performance. For example, it's not uncommon when benchmarking a new optimization pass that clearly makes code better, the performance results do show some regressions. But are these regressions due to a problem with the patch, or due to noise effects not under the control of the compiler? Often, the noise levels in performance results are much larger than the expected improvement a patch will make. How can we properly conclude what the true effect of a patch is when the noise is larger than the signal we're looking for?

When we see an experiment that shows a regression while we know that on theoretical grounds the generated code is better, we see a symptom of only measuring a single sample out of the theoretical space of all not-under-the-compiler's-control factors, e.g. code and data layout variation.

In this presentation I'll explain this problem in a bit more detail; I'll summarize suggestions for solving this problem from academic literature; I'll indicate what features in LNT we already have to try and tackle this problem; and I'll show the results of my own experiments on randomizing code layout to try and avoid measurement bias.

avatar for Kristof Beyls

Kristof Beyls

ARM, AArch64, AArch32, benchmarking & testing automation.

Friday March 18, 2016 09:30 - 10:10

Attendees (14)