Add OpenMP runtime for codelets
This is a placeholder for my ongoing work attempting to bring OpenMP as a backend runtime for Chameleon.
The current implementation for codelets uses dependent
task, the plan is also to add some
target version and see how it behaves wrt existing runtimes.
I've "openmp-ified" all codelets, though they have definitely not been extensively tested (or tested at all, for some).
OpenMP doesn't have a "runtime init" or "runtime finalize" as the other runtimes, so we need to put a
#pragma omp parallel/
#pragma omp master somewhere to create a team of threads that will execute our tasks.
The semantic around the parallel region has some constraints (such as no
return or jump outside the region from within the region), and I didn't find a great place so far for them yet, so currently it lies directly in the timing files (see
timing/time_zpotrf_tile.c for an example).
I'll focus on doing the first experiments on a small subsets of kernels, meanwhile feel free to give me feedback on this!