Features list
- THE workstealing for tasks
- OMPT extension for tasks with dependencies
-
Trace library (generating
dot
, and stats about tasks execution, perf counters, etc) -
affinity
clause support (can be used to improve data locality) -
support concurrent write extension in
cw
dependency definition. - support critical withint concurrent write - also called commute-
- support variable length dependencies between tasks through runtime function call
- adaptive implementation of OpenMP Task Loop
- compatibility with GCC libGOMP runtime
New clause support requires an extended compiler to handle them, you can check out our Clang extension to support them.
Clone and configuration
Variant of code is selected using macro during configuration of the repository with cmake. The following table resumes added macro and its impacts.
maco | impact |
---|---|
LIBOMP_USE_THEQUEUE | activates the Cilk work stealing protocol T.H.E |
LIBOMP_USE_THE_AGGREGATION | adds request combining to T.H.E protocol |
LIBOMP_USE_AFFINITY | defines affinity scheduler, required LIBOMP_USE_NUMA |
LIBOMP_USE_NUMA | uses libnuma to get access to NUMA topology |
LIBOMP_USE_DYNHASH | to add extension for dynamic resizeable hash map |
LIBOMP_USE_CONCURRENT_WRITE | activation of support for CW depend - else considered as inout |
LIBOMP_KAAPI_TRACING | defines tracing facilities - required LIBOMP_HAVE_OMPT_SUPPORT |
LIBOMP_USE_PAPI | compiles tracing facilities with PAPI - required LIBOMP_KAAPI_TRACING and installed version of PAPI |
LIBOMP_USE_VARDEP | to add extension for variable length dependencies |
How to choose configuration options ?
They are not absolute rules. But:
- If your programs have lot of fine grain (recursive) tasks: prefer T.H.E protocols and queues with request aggregation. Not that T.H.E. option 'DLIBOMP_USE_THEQUEUE' also provide unbound queues which may have better scheduling properties.
- If your programs have a lot of dependencies (in sense of OpenMP= generated by the same task region): use option 'LIBOMP_USE_DYNHASH' which has better hash function (on our examples) with dynamic resizeble hash to reduce collision and making faster find operation.
Once configure options selected, you need to compile and install the library. Typical output if configured with tracing tools is the following.
> make -j
Scanning dependencies of target omptarget
Scanning dependencies of target omptarget.rtl.x86_64
[ 4%] Generating kmp_i18n_id.inc
[ 6%] Generating kmp_i18n_default.inc
[ 6%] Generating git_hash.h
[ 6%] Generating hw_count.h
[ 9%] Building CXX object libomptarget/CMakeFiles/omptarget.dir/src/omptarget.cpp.o
[ 9%] Building CXX object libomptarget/plugins/x86_64/CMakeFiles/omptarget.rtl.x86_64.dir/__/generic-elf-64bit/src/rtl.cpp.o
[ 9%] Built target libomp-needed-headers
Scanning dependencies of target katracereader
Scanning dependencies of target omp
[ 14%] Building CXX object runtime/src/CMakeFiles/katracereader.dir/kaapi_trace_reader.cpp.o
[ 15%] Building CXX object runtime/src/CMakeFiles/katracereader.dir/kaapi_trace_simulator.cpp.o
[ 15%] Building CXX object runtime/src/CMakeFiles/katracereader.dir/katracereader.cpp.o
[ 17%] Building C object runtime/src/CMakeFiles/katracereader.dir/kaapi_trace_rt.c.o
[ 23%] Building C object runtime/src/CMakeFiles/katracereader.dir/kaapi_parser.c.o
[ 23%] Building C object runtime/src/CMakeFiles/katracereader.dir/kaapi_rt.c.o
[ 23%] Building C object runtime/src/CMakeFiles/katracereader.dir/poti/src/poti.c.o
[ 23%] Building C object runtime/src/CMakeFiles/katracereader.dir/poti/src/poti_events.c.o
[ 23%] Building C object runtime/src/CMakeFiles/katracereader.dir/poti/src/poti_header.c.o
[ 26%] Building C object runtime/src/CMakeFiles/omp.dir/kaapi_sched_ccsync.c.o
[ 30%] Building C object runtime/src/CMakeFiles/omp.dir/kaapi_rt.c.o
[ 33%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_alloc.cpp.o
[ 34%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_atomic.cpp.o
[ 36%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_csupport.cpp.o
[ 39%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_debug.cpp.o
[ 41%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_itt.cpp.o
[ 44%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_environment.cpp.o
[ 44%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_error.cpp.o
[ 46%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_global.cpp.o
[ 50%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_i18n.cpp.o
[ 53%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_io.cpp.o
[ 53%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_runtime.cpp.o
[ 79%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_utility.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_affinity.cpp.o
[ 69%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_taskq.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_hws.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/ompt-general.cpp.o
[ 57%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_settings.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_wait_release.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_version.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_cancel.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_sched.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_threadprivate.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_ftn_extra.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_queues.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_gsupport.cpp.o
[ 66%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_tasking.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_dispatch.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_lock.cpp.o
[ 65%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_str.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_ftn_cdecl.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_taskdeps.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o
[ 80%] Building CXX object runtime/src/CMakeFiles/omp.dir/kmp_barrier.cpp.o
[ 80%] Building C object runtime/src/CMakeFiles/omp.dir/z_Linux_asm.s.o
[ 82%] Linking CXX shared library libomptarget.rtl.x86_64.so
[ 82%] Built target omptarget.rtl.x86_64
[ 84%] Linking C shared library libomp.so
[ 84%] Built target omp
Scanning dependencies of target omp-trace
[ 87%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_ompt.c.o
[ 90%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_trace_rt.c.o
[ 90%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_rt.c.o
[ 95%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_trace_lib.c.o
[ 95%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_hashmap.c.o
[ 95%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_parser.c.o
[ 95%] Building C object runtime/src/CMakeFiles/omp-trace.dir/kaapi_recorder.c.o
[ 96%] Linking CXX shared library libomptarget.so
[ 96%] Built target omptarget
[ 98%] Linking C shared library trace-libomp.so
[ 98%] Built target omp-trace
[100%] Linking CXX executable katracereader
[100%] Built target katracereader
> make install
Install the project...
-- Install configuration: "debug"
-- Installing: /home/tgauti01/local/libkomp-release/lib/libomp.so
-- Installing: /home/tgauti01/local/libkomp-release/lib/trace-libomp.so
-- Set runtime path of "/home/tgauti01/local/libkomp-release/lib/trace-libomp.so" to ""
-- Installing: /home/tgauti01/local/libkomp-release/bin/katracereader
-- Up-to-date: /home/tgauti01/local/libkomp-release/include/omp.h
-- Up-to-date: /home/tgauti01/local/libkomp-release/include/ompt.h
-- Installing: /home/tgauti01/local/libkomp-release/lib/libomptarget.so
-- Installing: /home/tgauti01/local/libkomp-release/lib/libomptarget.rtl.x86_64.so
Details about each extensions
Activate protocol T.H.E. for stealing tasks.
The true Cilk T.H.E. work stealing protocol is integrated in the source code. Code comes from Kaapi library. To activate it:
cmake -DLIBOMP_USE_THEQUEUE=true
Activate request combining with T.H.E. protocol
Aggregate steal requests in order to compact overhead in stealing to the combinator thread. The protocol is based on CCSync from Fatourou et ak.
cmake -DLIBOMP_USE_THE_AGGREGATION=true
Activate the dynamic resizeable hash table
By default, computing the dependencies between tasks use fixed size hash table which is subject to high collisions when number of dependencies is high. In order to use dynamic resizeable hash table configure libKOMP with following options:
cmake -DLIBOMP_USE_DYNHASH=true
Activate the affinity scheduler
Support for affinity on tasks requires to activate both the affinity and hwloc switches:
cmake -DLIBOMP_USE_AFFINITY=true -DLIBOMP_USE_HWLOC=true
In order to be able to put affinity
clauses on tasks constructs, you will also need to compile LLVM/Clang with our extended clang frontend.
Activate the perf counters and the trace library
They are part of the OMPT switch:
cmake -DLIBOMP_OMPT_SUPPORT=true -DLIBOMP_KAAPI_TRACING=true
Activate the variable length extension for task dependencies
cmake -DLIBOMP_USE_VARDEP=true
Compilation of programs
To compile your OpenMP in order to use libkomp is very simple:
- with clang
Executing programs
If you don't want to use any tracing or performance counters, you just need to run your program like with any other runtime/compiler.
You can check which runtime you're using by looking at the libraries dynamically, eg by using ldd ./program
.
With tracing and performances counters
The following environment variables are useful:
-
KAAPI_RECORD_TRACE
: activates the tracing -
KAAPI_RECORD_MASK
: determine what to trace -
KAAPI_PERF_EVENTS
: what kind of counters to look at -
KAAPI_DISPLAY_PERF
: when to display the counters
Here is a sample execution, that will get the performance counters related to tasks, and will record basic information about the execution in the traces:
KAAPI_RECORD_TRACE=1 KAAPI_RECORD_MASK=compute,omp,perfctr KAAPI_PERF_EVENTS=task KAAPI_TASKPERF_EVENTS=work,time KAAPI_DISPLAY_PERF=final LD_PRELOAD=$HOME/local/lib/trace-libomp.so ./dpotrf_taskdep -n 1024 -b 256
Note that you will have to adjust the LD_PRELOAD
to the actual location of the tracing library compiled with the runtime.
It should create a stats.<pid>
file, as well as one /tmp/events.[...]
file per cpu used during the execution.
How to manipulate traces
The runtime is shipped with katracereader
, a tool to manipulate the aforementioned trace files.
Using katracereader -h
will give you all the options.
For example if you want the dags for the parallel regions of your programs, you can use something like:
katracereader --dot /tmp/events.user.pid.*
(adjusting your user name and the pid)
Please see the follwing Wiki pages:
- katracereader: the libKOMP tool to convert trace of execution to CVS (gantt), dot (graph) etc...
- more information about event performance counters