Add possibility to use z/cgemm3m for complex mat-mat products
This routine, available in MKL, does a product in 6n^3 ops instead of 8n^3 but is interesting only for "large enough" matrices (to be tested...) Potentially, we gain 25 % in all complex computations. It could be interesting to look for it / implement it in cuda. !!! Note that the flop counters are not updated !!! !!! In C/Z accuracy, most flops counter should be x0.75 !!! IT is OFF by default It is activated with MORSE_Enable(MORSE_GEMM3M) In the timing routines, it is activated with --gemm3m
Showing
- control/context.c 13 additions, 0 deletionscontrol/context.c
- coreblas/compute/core_zgemm.c 12 additions, 0 deletionscoreblas/compute/core_zgemm.c
- include/morse_constants.h 1 addition, 0 deletionsinclude/morse_constants.h
- include/morse_struct.h 1 addition, 0 deletionsinclude/morse_struct.h
- timing/timing.c 9 additions, 0 deletionstiming/timing.c
- timing/timing.h 1 addition, 0 deletionstiming/timing.h
Loading
Please register or sign in to comment