Configuring libkomp to have tracing tools on
libKOMP tracing tool is based on OMPT support to generate KAAPI events in events' trace files. The events captured and recorder are controlled by environment variables.
Assuming you have downloaded the library. The following lines create a new build library and configure the library to use tracing tools + affinity support based on T.H.E protocol with aggregation protocols.
> mkdir build_release_trace
> cd build_release_trace
> cmake ../openmp-kaapi \
-DLIBOMP_OMPT_SUPPORT=on -DLIBOMP_KAAPI_TRACING=on \
-DLIBOMP_USE_HWLOC=true \
-DCMAKE_INSTALL_PREFIX=<your prefix> \
-DCMAKE_BUILD_TYPE=release
You should also consider the capability to attach hardware performance counter to task, parallel region or loops. To do that, you should configure the library with PAPI support:
> cmake ../openmp-kaapi \
-DLIBOMP_OMPT_SUPPORT=on -DLIBOMP_KAAPI_TRACING=on \
-DLIBOMP_USE_HWLOC=true \
-DCMAKE_INSTALL_PREFIX=<your prefix>\
-DCMAKE_BUILD_TYPE=release\
-DLIBOMP_USE_PAPI=true
Runing your OpenMP code
Once your OpenMP code has been download, then you can run it with environment variables to select which events you want to capture.
The way to run binary is to preload a specific library trace-libomp.so built during compilation if tracing tool is configured.
Following lines run the Kastors's Cholesky factorization on 96 cores with capture of events for work and time counters + all events dealing with computation and OMP specific features.
> N=96; PREFIX=<your prefix>/lib; OMP_NUM_THREADS=$N \
OMP_PLACES="cores($N)" \
LD_LIBRARY_PATH=$PREFIX \
OMP_TOOL=enabled \
KAAPI_RECORD_TRACE=1 \
KAAPI_RECORD_MASK=compute,omp \
KAAPI_TASKPERF_EVENTS=work \
LD_PRELOAD=$PREFIX/trace-libomp.so \
./dpotrf_taskdep -n 16384 -b 256 -i 3
You can found details about events and performance counters here.
How to attach PAPI hardware performance counters?
If you have configured the library with PAPI support, you could record hardware performance counters attached to each task while running. For instance you could specifying (ellipses are same command line as above):
> N=96; PREFIX=<your prefix>/lib; <...> \
KAAPI_RECORD_MASK=compute,omp,perfctr \
KAAPI_TASKPERF_EVENTS=work,PAPI_TOT_CYC,PAPI_TOT_INS \
<...> \
./dpotrf_taskdep -n 16384 -b 256 -i 3
Do not forget to add ````perfctr``` in the set of events.
Processing generated data
All events or performances counters are registered per thread into files located in /tmp. The typical outputs of the previous lines are:
[OMP-TRACE] ompt-trace ompt_tool initialized
[OMP-TRACE] kaapi tracing version: Git last commit:624e6eb649329445+
...
##Progname Size Blocksize Iterations Threads Gflops(Mean) Stddev
dpotrf_taskdep 16384 256 3 192 1714.021480 47.314433
#Experience summarry : avg : 1714.021480 :: std : 47.314433 :: min : 1653.340947 :: max : 1768.782805 :: median : 1719.940687
[OMP-TRACE] kaapi tracing tool closed.
All events or performances counters are registered per thread into the files:
/tmp/event.$USER.<pid>.<tid>.evt
where pid
is the process id of the spawned processus and tid
s are the thread identifiers that ranges from 0 to $OMP_NUM_THREADS-1, i.e. the maximal number of threads used at runtime.
These files contains all the usefull informations to generate:
- a DAG of the dependencies between tasks. One graph is generated per parallel region.
- a GANTT of the thread activities between the start of the program until its ends.
- a CSV of tasks' executions and threads' activities
- a dump of performance counters.
All these features are available through katracereader installed in /bin repository.
Generating a CSV from internal trace files
CSV generation is now based on katracereader passing option --csv: For instance:
> katracereader --csv /tmp/events.gautier.175535.*
...
*** File 'parallels.csv' generated
*** File 'threads.csv' generated
*** File 'tasks.csv' generated
Each csv file contains informations about specific feature of the OpenMP program. Here, tasks.csv defines start/end of each tasks, its names, the list of value for each counter attached.. Each of csv files begins by a CSV header parsed by R function 'read.csv'.
Plotting a Gantt chart with R
The format of the file 'tasks.csv' is given by the following R function to read it:
> library(dplyr);
> readtrace <- function (filename)
{
df <- read.csv(filename, header=TRUE, sep=",", strip.white=TRUE);
df <- df %>% filter((Explicit==1)) %>% as.data.frame();
df$Start <- df$Start*1e-9; # Convert ns to second
df$End <- df$End*1e-9;
df$Duration <- df$Duration*1e-9;
df;
}
> df <- readtrace("/Users/thierry/tasks.csv");
> head(df);
Resource Numa Start End Duration Explicit Aff Strict Tag Name
1 46 1 1499947765 1499947765 0.002393115 1 2 1 1 func: dplgsy file: unknown line: 0
2 15 0 1499947765 1499947765 0.002733835 1 2 1 0 func: dplgsy file: unknown line: 0
3 8 0 1499947765 1499947765 0.002776491 1 2 1 0 func: dplgsy file: unknown line: 0
4 43 1 1499947765 1499947765 0.002839405 1 2 1 1 func: dplgsy file: unknown line: 0
5 51 2 1499947765 1499947765 0.002407908 1 2 1 2 func: dplgsy file: unknown line: 0
6 58 2 1499947765 1499947765 0.002131242 1 2 1 2 func: dplgsy file: unknown line: 0
TaskId Work PAPI_TOT_CYC PAPI_TOT_INS Origin
1 75264 0.0023888 1681121 1390710 /Users/thierry/tmp/csv/tasks.csv
2 75008 0.0027351 1003768 890397 /Users/thierry/tmp/csv/tasks.csv
3 76032 0.0027742 1585495 1390256 /Users/thierry/tmp/csv/tasks.csv
4 76288 0.0028371 1420764 1388990 /Users/thierry/tmp/csv/tasks.csv
5 75520 0.0024026 1357263 1390318 /Users/thierry/tmp/csv/tasks.csv
6 76544 0.0021288 1554343 1390194 /Users/thierry/tmp/csv/tasks.csv
With the following meaning:
- Resource: the id of the thread executing the task
- Numa: the numa node attached to the thread
- Start,End,Duration: of the task
- Explicit==0 if implicit OpenMP task else 1
- Aff, Strict, Tag: affinity given to the task
- Name: the task' name
- TaskId: a system wide task identifier
- Work, PAPI_TOT_CYC, PAPI_TOT_INST: the performance counters specifyed in the variable
KAAPI_TASKPERF_EVENTS
Once csv file loaded, try:
library(ggplot2);
# helper: convert s to the date
date<-function(d) { as.POSIXct(d, origin="1970-01-01"); }
# theplot
ggplot() +
theme_bw(base_size=16) +
xlab("Time [s]") +
ylab("Thread Identification") +
scale_fill_brewer(palette = "Set1") +
theme (
plot.margin = unit(c(0,0,0,0), "cm"),
legend.spacing = unit(.1, "line"),
panel.grid.major = element_blank(),
panel.spacing=unit(0, "cm"),
panel.grid=element_line(0, "cm"),
legend.position = "bottom",
legend.title = element_text("Helvetica")
) +
guides(fill = guide_legend(nrow = 1)) +
geom_rect(data=df, alpha=1, aes(fill=Name,
xmin=date(Start),
xmax=date(End),
ymin=Resource,
ymax=Resource+0.9)) +
scale_y_reverse();
You should obtain:
Where:
* by zooming you can see other tasks than dgemm (red)!
* there is 4 computations while only 3 are specified because the benchmark makes one extra warmup computation
Generating a Paje file format
You could also generate using katracereader
a Paje file format. You can use the Vite application to display the Gantt.
For instance:
> katracereader --vite /tmp/events.gautier.175535.*
The output file vite-gantt.trace may be visualized trough Vite.
One step further: plotting distribution of the task execution times
df %>%
ggplot() + geom_histogram(aes(x=Duration, fill=Name), bins=100) + facet_wrap(~Name, nrow=1, scales="free") + theme_bw(base_size=12);
You should obtain the following plot:
Acknowledgment
We thank Lucas Schnorr and Arnaud Legrand for provide us R codes.