Commit 9187da26 authored by Philippe SWARTVAGHER's avatar Philippe SWARTVAGHER
Browse files

Improve README

parent 51b78f39
# Impact of memory contention on communications
Evaluate the impact of memory contention on communications (and vice-versa).
Benchmark suite to evaluate the impact of memory contention on communications
(and vice-versa).
## Building
## Requirements
When used with NewMadeleine, use a version of NewMadeleine without PIOman (profile `puk+madmpi-mini.conf` for instance), to avoid having other threads to bind correctly...
- Communications are made with the MPI API, so you need an MPI library. If you
use MadMPI (from [NewMadeleine](http://pm2.gforge.inria.fr/newmadeleine/),
build NewMadeleine with the profile `pukabi+madmpi-mini.conf`).
- If you want to measure frequencies,
[LIKWID](https://hpc.fau.de/research/tools/likwid/) can be used, but there is
also a version of our code which does not need it.
- If you want to measure the impact of a task-based runtime system,
[StarPU](https://starpu.gitlabpages.inria.fr/) is required.
- Some computing benchmarks use the Intel MKL library.
- [hwloc](https://www.open-mpi.org/projects/hwloc/) is used to bind threads.
## Available programs
- `bench_openmp` measures interferences between communications and computations
when OpenMP is used to parallelize computations.
- `bench_openmp_likwid`: same as `bench_openmp` but measures also frequencies
with LIKWID.
- `bench_openmp_freq`: same as `bench_openmp` but measures also frequencies,
by reading content of `/proc` files.
- `bench_starpu`: same as `bench_openmp`, but uses StarPU to parallelize
computations.
- `uncore_get` and `uncore_set` use LIKWID to respectively get and set uncore
frequencies of sockets.
Build each program with `make`:
```bash
make
make <program>
```
This command generates two binaries: one is with OpenMP and the other one is with StarPU.
## Main program
There is one main source file for both OpenMP and StarPU. Runtime is choosen at compile-time, with a define (see the `Makefile`).
## Benchmarking
You can then chose the computing benchmark (`--compute_bench={stream,prime,cholesky}`) and the communication benchmark (`--bench={bandwidth,latency}`). All available options are listed in the help of the program (`--help`).
You can then chose the computing benchmark
(`--compute_bench={stream,prime,cursor,scalar,scalar_avx}`) and the
communication benchmark (`--bench={bandwidth,latency}`). All available options
are listed in the help of the program (`--help`).
Examples of executions:
```bash
ncores=$(hwloc-calc all -N core)
nbnuma=$(hwloc-calc all -N node)
last_numa_node=$(($nbnuma-1))
for i in $(seq 1 $((ncores-1)));
do
mpirun -DOMP_NUM_THREADS=$i -DOMP_PROC_BIND=true -DOMP_PLACES=cores hwloc-bind --cpubind core:0-$((i-1)) ./bench_openmp --compute_bench=prime --ping_thread=last &> latency_thread_last_$((i))_threads.out
done
for i in $(seq 1 $((ncores-1)));
do
mpirun -DOMP_NUM_THREADS=$i -DOMP_PROC_BIND=true -DOMP_PLACES=cores hwloc-bind --cpubind core:0-$((i-1)) ./bench_openmp --compute_bench=prime --ping_thread=last --bench=bandwidth &> bandwidth_thread_last_$((i))_threads.out
done
for i in $(seq 2 $((ncores-2)));
do
mpirun -DSTARPU_NCPU=$i -DSTARPU_MAIN_THREAD_BIND=1 -DSTARPU_WORKERS_CPUID=1- -DSTARPU_MAIN_THREAD_CPUID=$((ncores-2)) -DSTARPU_MPI_THREAD_CPUID=$((ncores-1)) ./bench_starpu --compute_bench=stream --no_stream_add --no_stream_scale --ping_thread=first &> latency_thread_first_main_last_mpi_last_$((i-1))_threads.out;
done
```
See `bench_suite.example.sh` to see how all combinaisons of parameters can be launched.
Environment variables and `hwloc-bind` are used to correctly bind threads to
cores. See `bench_suite.example.sh` to see how all combinaisons of parameters
can be launched.
......@@ -31,7 +80,14 @@ See `bench_suite.example.sh` to see how all combinaisons of parameters can be la
Scripts are in the `plot` folder and require Python with Matplotlib.
`plot_comm_stream_nb_threads.py` is the main script. It plots computing benchmark results and communication performances on the same graph, according to the number of cores.
`plot_comm_stream_nb_threads.py` is the main script. It plots computing
benchmark results and communication performances on the same graph, according
to mainly the number of cores.
For instance:
```bash
python3 plot_comm_stream_nb_threads.py bandwidth_thread_last_* --per-core --top=10000 --stream-top=15000 --o=bandwidth_thread_last.png --title="Network Bandwidth and STREAM Benchmark"
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment