Files · 747c7935567f2c953a39b224cb116c9a924544eb · solverstack / Chameleon

"git@gitlab.inria.fr:solverstack/chameleon.git" did not exist on "747c7935567f2c953a39b224cb116c9a924544eb"

Add possibility to use z/cgemm3m for complex mat-mat products

Guillaume Sylvand authored 8 years ago

This routine, available in MKL, does a product in 6n^3 ops instead of 8n^3
but is interesting only for "large enough" matrices (to be tested...)
Potentially, we gain 25 % in all complex computations.
It could be interesting to look for it / implement it in cuda.

!!! Note that the flop counters are not updated         !!!
!!! In C/Z accuracy, most flops counter should be x0.75 !!!

IT is OFF by default
It is activated with MORSE_Enable(MORSE_GEMM3M)
In the timing routines, it is activated with --gemm3m

747c7935

History

747c7935 8 years ago

History

Name	Last commit	Last update
cmake_modules
compute
control
coreblas
cudablas
docs
example
include
lib/pkgconfig
plasma-conversion
runtime
simucore
testing
timing
.dir-locals.el
CMakeLists.txt
CTestConfig.cmake
ChangeLog
INSTALL.txt
LICENCE.txt