diff --git a/documentation/README.rst b/documentation/README.rst index d9e48540490c986d487bd97e2d5b1b13ed189779..e94929b99dbeabfb5a60206dfa02dfa726a3db0d 100644 --- a/documentation/README.rst +++ b/documentation/README.rst @@ -98,3 +98,10 @@ Add these lines to your `.gdbinit`: # for debugging set python print-stack full + +Generate mcGDB archive +---------------------- + +.. code-block:: bash + + git archive --format=tar.gz -o /path/to/mcgdb.tgz --prefix=$PREFIX/mcgdb/ master diff --git a/documentation/dummy_modules/gdb.py b/documentation/dummy_modules/gdb.py index 05cff8ded1b2b68815d4d22f00d44af7777fb295..5635e4b3497068c4c2deb3db6e4175ada31a6f79 100644 --- a/documentation/dummy_modules/gdb.py +++ b/documentation/dummy_modules/gdb.py @@ -10,3 +10,5 @@ class Parameter: pass class frames(): class FrameDecorator(): pass frame_filters = {} + +class Function: pass diff --git a/documentation/index.rst b/documentation/index.rst index 93d8ba11495f8ee0529ed081e6ea3258be755965..f18903ff55cc3015509feab548f0b7020a43f7a5 100644 --- a/documentation/index.rst +++ b/documentation/index.rst @@ -4,6 +4,22 @@ A Programming Model-Centric Debugger: `mcGDB` `mcGDB` is a GDB+Python implementation of `Programming-Model Centric Debugging`_, from Kevin Pouget's PhD thesis work. +Last Developments +----------------- + +.. toctree:: + :maxdepth: 2 + + openmp + +.. toctree:: + :maxdepth: 3 + + perf_debugging + +Modular Architecture +-------------------- + `mcGDB` has a modular architecture, that can be extended to support new programming models and environments: @@ -13,10 +29,10 @@ new programming models and environments: - `mcgdb.model` holds the model-specific submodules. Currently, we provide: - * mcgdb.model.task.environments.openmp: our current target (still in - development): + * mcgdb.model.task.environments.openmp: our current target (still + under development): - - `OpenMP` programming environment + - `OpenMP`_ programming environment * mcgdb.model.gpu_: module for kernel-based GPU programming, with two environment support: @@ -56,6 +72,8 @@ new programming models and environments: Content ------- +This documentation is organized as follows: + .. toctree:: README diff --git a/documentation/openmp.rst b/documentation/openmp.rst index bc21aada1c4978787fa2a857b1308d62a3c422c4..830f7b323ba82f5e25ba208e40ec219922afa1ac 100644 --- a/documentation/openmp.rst +++ b/documentation/openmp.rst @@ -1,5 +1,5 @@ ============================= - `mcGDB` Module for `OpenMP` + `OpenMP` Module for `mcGDB` ============================= As part as the first deliverable of the Nano2017 DEMA project, we diff --git a/documentation/perf_debugging.rst b/documentation/perf_debugging.rst new file mode 100644 index 0000000000000000000000000000000000000000..7c7254cbfae07365b9101d1bdf1785ba4d9e2488 --- /dev/null +++ b/documentation/perf_debugging.rst @@ -0,0 +1,181 @@ +=================================== +Performance Debugger Implementation +=================================== + +As part of the Deliverables 3 and 4 of the Nano2017/Dema project, we +worked on a performance debugger / interactive profiler. + +In this document, we present the organization of its source code, +located in `mcgdb.model.profiling`. + + +Interactive Profiling +===================== + +The foundation of the interactive profiler are described in +Deliverable D2. They are independent of the programming model. + + +Loading the Profiler +-------------------- + +The interactive profiler does not require exotic Python packages, nor +does it affects the execution without the user request, so it is +automatically loaded. + +See `mcgdb/model/profiling/interaction/__init__.py` for disabling some +interaction commands. + +See `mcgdb.model.numa.cmd_numa` and `mcgdb.model.numa.on_activated` +for on-demand activation. + +Profiling Counters +------------------ + +As described in the report, the execution profiling is delegated to +actual execution profilers. These profilers are accessed through an +common interface, that allows starting and stopping the profiling, and +querying the counter values. + +See `mcgdb.model.profiling.info.generic` for an example of generic +counter. + +The list of counters provided by *a given source file* is exported in +the `__COUNTERS__` list. + +The list of source files to load is defined in +`mcgdb.model.profiling.info.info_counters`. In the current +implementation, there is no way to control from GDB the list of active +counters. This should not be hard to implement, but at the moment, you +have to manually set the list of enabled modules. + +`perf stat` Counters +-------------------- + +`mcgdb.model.profiling.info.perf[_standalone]` is the most useful +counter, but also the most complicated implementation. It is split in +two modules ... because I never took the time to properly merge them +back :(. The difference between them is that `perf` relies on +`libmcgdb_perf_stat.preload.so`, and `perf_standalone` doesn't. + +Natively, `perf stat` can only be attached/detached to a process to +perform its profiling. Doing it often is time consuming. We found a +way to improve that, but it only works on modern version of `perf +stat`. Unfortunately, `idchire` system doesn't offer such a recent +version ... + + +`perf` counter and `libmcgdb_perf_stat.preload.so` +__________________________________________________ + +To finely control the profiling periods, we implemented a library +(`mcgdb/model/profiling/perf_preload.c`) that should be preloaded into +`perf` memory space. During the initialization of the library +(function `init` with `constructor` attribute), we read the +`PROCESS_INTERVAL_ADDR` environment variable. + +In this variable, we expect to find the address of `perf`'s +`process_interval`. This address can be obtained with this command: + +.. code-block:: bash + + nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1 + +but in the interactive profiler, this is automatically handle in the +Python code. + +With this preloaded library, `perf` will listen to the `SIGUSR2` +signal, and dump its counters upon reception. + +In the debugger, we start this extended version of `perf` at the +program launch, and read the counters at the beginning and end +of the profiling regions. The region profile corresponds to the +difference of these values. + +The problem of this approach is for multithreaded applications, as +`perf` is attached to the process. One solution is to enable GDB's +`scheduler-locking` during the profiling, the other would be to +attached `perf` to a each of the threads. I didn't try this second +solution. + + +`perf_standalone` counter +_________________________ + +This is a fork of the previous code, for old `perf` version, such as +the one running on `idchire` machine (`perf version 3.0.76`). These +versions do not have an equivalent of the `process_interval` function, +so the process have to be started and stopped for each profiling +region. + +Both modules should be merged, but I did not have the time ... + + +OpenMP loop counters +-------------------- + +.. seealso:: + `mcgdb.model.profiling.info.omp` + +These counters are not linked with an execution profiler, they store +OpenMP loop boundaries. They work this way: + +* `profiling.info.omp.loop`, is a two-element list (it could be more + structured, sorry): `(start, length)`. +* in `openmp.interaction.loop.loop_aspects()`, I register a callback + (aspect) on `ForLoop.start_work_on`. +* I know that in the current implementation, the loop profiling is + sequential, so in the aspect, I can save the lower and upper bound + of **the last loop that started**. The loop profiling should start + soon and save these values in the counter. + +.. Caution:: + OpenMP loop counters only work with forced sequentiallity. It + would not be hard to improve it though. + +I also implemented a GDB-cli Python function, see +`mcgdb.model.task.environment.openmp.interaction.fct_omp_loop_start`. +This function allows the user to get the loop start index from the +command line: + + +.. code-block:: bash + + (gdb) print $omp_loop_start() + +This function returns `None` is there is no loop currently active for +the current thread, or its lower iteration bound. This implementation +looks better (=more reliable) that what I did for the counter above. + +NUMA counters +------------- + +.. seealso:: + `mcgdb.model.profiling.info.numa` + +The NUMA counters also come from the debugger and not from an +execution profiler. The implementation of `numa node` and `numa core` +just consists in calling the corresponding command-line functions: + +.. code-block:: bash + (gdb) numa current_node -raw + (gdb) numa current_core -raw + +The `-raw` flag skips the user-friendly text that come with the +answer. + +Class `numa_location_info` implements an experimental feature, that +consists in checking the location of a memory address at the beginning +of each profiling. There is no user interface to provide the address, +it is hard-coded (and commented out) in +`mcgdb.model.profiling.info.new_infoset`: + + +.. code-block:: python + if info_counter_class.__name__ == "numa_location_info": + counters.append(info_counter_class("&r[$omp_loop_start()][0][0]")) + +The counter worked as expected, but it was way too slow. As presented +in the deliverable, a solution could be to load/implement `pagemap` +directly inside GDB, to avoid paying the process creation and +initialization time for each lookup.