Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way
This repository is the companion of our article presented at the RepPar 2017 workshop. This repository contains all the source code and data we used for the study of memory caches.
The org-mode source of this article allow to relate the figures with the corresponding data. You should open this file with emacs to access all the hidden sections that detail how the figures are built from the data and what are the software requirements.
For any information, please contact Arnaud Legrand and Luka Stanisic.
Disclaimer
When starting this study, in 2012 (five years ago), we were still searching for the optimal methodology, files organization and naming conventions. Hence, many source code files, functions or data outputs have greatly evolved during the study. This makes linking older data outputs with the current scripts and source code files a bit more difficult, although the history of changes is saved in SVN history of the initial projects.
Here are few notes that should help understanding the code and data organization:
-
File run.sh is the main script for capturing the metadata, compiling the source code, generating design of experiments (DoE), running the experiments and finally registering the results. It takes Parameters.txt as an input and produces a single output data file in a specified folder.
-
File Parameters.txt is the input file which is edited manually by the user. Comments for each option should aid in understanding each option.
-
File src/program.c (previously kernel.c) contains the main program for running the benchmark. The main algorithm of this code resembles to the MultiMAPS benchmark, with a lot of additional wrappers to allow numerous input configurations.
-
File src/inputGenerator/InputGenerator.c is used for a proper and well controlled DoE. It takes simple inputs from file Parameters.txt and generates the plan of experiments, randomizing the measures with similar inputs.
-
Other scripts in the root directory are mostly for deploying and gathering the data from the experimental machines, variants of the main run.sh for a specific architecture+system combination or scripts for producing default analysis figures.
-
Folder analysis/ contains Sweave/knitr scripts for doing analysis of the data using literate programming, combining textual comments with executable R code blocks.
-
All other folders contain the data from a wide range of experiments, some of them not even presented in the final report/paper. The name of each file is related to the source code version (more accurately to the name of the project), and not to the input parameters. Indeed, during this study multiple projects with different naming conventions were developed, and due to the lack of centralized documentation of the experiments, it is hard to explore the data. If one still aims at understanding and possibly comparing the data, we suggest writing simple scripts based on regular expressions, and exploiting the fact that all data is stored in plain text format.