Draft: RFC benchmark tool to track performance regression
Here is a proposal for a benchmarking tool whose goal is to track performance regression.
It uses a home-made tool called pynchmark (heavily inspired by : https://gitlab.inria.fr/delamare/pynchmark
The benchmarked function are written in benchmark/pynchmark/bench.py
To start a benchmark, you can for instance run pynchmark -f bench.py -b benchmarks -o results.csv --max_repeat_duration 5
I open this MR for discussion. In particular:
- Is there more stuff to be tested than what's in current bench.py? (my initial aim is to not go beyond basicops)
- There is two "kinds" of lazylinop tested: one is an "aslazylinopsed" array and one other represent a lazy composition of krons with butterfly support. Is it representative enough?
- I run the benchmark on a Grid'5000 node with 64 cores. It causes a large difference between multithreaded and non multithreaded operations. As this is synthetic benchmark, we could restrict the number of core (maybe 4 or 8 ?)
I've provided two benchmark results for 1.12.0 and 1.14.1 versions. They can be compared with:
pynchmark -i 20250201_1.14.1.csv --compare 20250201_1.12.0.csv
(results are given as examples, the benchmark did not run under good conditions)