-
- Downloads
Try to optimize GivensFGFTParallel::update_L() by using OpenMP to parallelize...
Try to optimize GivensFGFTParallel::update_L() by using OpenMP to parallelize the product of each parallel Givens matrix with L but it was unfruitful. The idea was to give to each concurrent/parallel thread one submatrix of the Givens matrix to multiply (the rotation part) then continue until having exhausted the submatrices (and finally finished to compute the whole product the parallel way). The OpenMP directives are kept in the code but disabled. The compilation constant OPT_UPDATE_L_OMP must be set to enable the use of OMP. Besides, -fopenmp flags have to be passed to cmake (or in CMakeCache.txt for both linker and compile flags), likewise for setup.py when compiling the python wrapper (extra_compile_flags, extra_link_flags) and something similiar for matlab wrapper compil. The probably cause for the inefficience of OpenMP here is the memory workload (uncontiguous accesses) of each thread which is not a good deal compared to the parallelization of the matrix product allowed by OpenMP.
Showing
- CMakeLists.txt 2 additions, 1 deletionCMakeLists.txt
- src/algorithm/factorization/faust_GivensFGFT.h 2 additions, 2 deletionssrc/algorithm/factorization/faust_GivensFGFT.h
- src/algorithm/factorization/faust_GivensFGFT.hpp 4 additions, 4 deletionssrc/algorithm/factorization/faust_GivensFGFT.hpp
- src/algorithm/factorization/faust_GivensFGFTParallel.hpp 55 additions, 22 deletionssrc/algorithm/factorization/faust_GivensFGFTParallel.hpp
Loading
Please register or sign in to comment