• Guillaume Sylvand's avatar
    Add possibility to use z/cgemm3m for complex mat-mat products · 747c7935
    Guillaume Sylvand authored
    This routine, available in MKL, does a product in 6n^3 ops instead of 8n^3
    but is interesting only for "large enough" matrices (to be tested...)
    Potentially, we gain 25 % in all complex computations.
    It could be interesting to look for it / implement it in cuda.
    
    !!! Note that the flop counters are not updated         !!!
    !!! In C/Z accuracy, most flops counter should be x0.75 !!!
    
    IT is OFF by default
    It is activated with MORSE_Enable(MORSE_GEMM3M)
    In the timing routines, it is activated with --gemm3m
    747c7935
timing.h 8.84 KB