Aggregate database measures to increase reliability of results
With #3 (closed), we will cache benchmarks' IPC in a database for future use.
While doing that, we might as well require the measure to be made twice/thrice/something else. Then, while fetching the cached result,
- either we don't have enough datapoints yet, and we make another measure and commit it to database;
- or we have enough measures, which we aggregate in a single measure
The major perk would be that we could check the variance of the experiment, and decide whether those measures were reliable.