timing/timing.c · 747c7935567f2c953a39b224cb116c9a924544eb · AGULLO Emmanuel / Chameleon

Add possibility to use z/cgemm3m for complex mat-mat products · 747c7935

Guillaume Sylvand authored Sep 20, 2016

This routine, available in MKL, does a product in 6n^3 ops instead of 8n^3
but is interesting only for "large enough" matrices (to be tested...)
Potentially, we gain 25 % in all complex computations.
It could be interesting to look for it / implement it in cuda.

!!! Note that the flop counters are not updated         !!!
!!! In C/Z accuracy, most flops counter should be x0.75 !!!

IT is OFF by default
It is activated with MORSE_Enable(MORSE_GEMM3M)
In the timing routines, it is activated with --gemm3m

747c7935

Admin message

Admin message