- Jul 06, 2017
-
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
BOUCHERIE Raphael authored
-
- Apr 14, 2017
-
-
Add MORSE_Desc_Create_OOC, which is like MORSE_Desc_Create, but does not actually allocate a matrix, thus letting the runtime allocate on-demand the tiles, possibly pushing them to the disk. Add a --ooc option to tests to enable this.
-
- Mar 14, 2017
-
-
Mathieu Faverge authored
-
- Mar 09, 2017
-
-
Mathieu Faverge authored
-
- Mar 06, 2017
-
-
Mathieu Faverge authored
-
- Feb 14, 2017
-
-
COJEAN Terry authored
-
- Dec 24, 2016
-
-
Mathieu Faverge authored
-
- Dec 09, 2016
-
-
Mathieu Faverge authored
-
- Dec 01, 2016
-
-
PRUVOST Florent authored
- use (starpu_cpu_func_t) 1 trick, same as cuda_func - cpu funtions are not defined anymore avoiding the dependency to coreblas - add #if !defined(CHAMELEON_SIMULATION) where it is needed - remove dependency to the coreblas library (become useless) - remove useless simucblas, simulapacke libraries - remove CHAMELEON_SIMULATION_MAGMA cmake variable and definition - keep using CHAMELEON_USE_CUDA and CHAMELEON_USE_MAGMA to consider CUDA kernels - this avoid to introduce useless new variables - work on messages
-
- Oct 12, 2016
-
-
Guillaume Sylvand authored
-
Guillaume Sylvand authored
timing: add option --bigmat to choose if we allocate one big 'mat' array or if the runtime allocates the tile one by one
-
- Sep 20, 2016
-
-
Guillaume Sylvand authored
This routine, available in MKL, does a product in 6n^3 ops instead of 8n^3 but is interesting only for "large enough" matrices (to be tested...) Potentially, we gain 25 % in all complex computations. It could be interesting to look for it / implement it in cuda. !!! Note that the flop counters are not updated !!! !!! In C/Z accuracy, most flops counter should be x0.75 !!! IT is OFF by default It is activated with MORSE_Enable(MORSE_GEMM3M) In the timing routines, it is activated with --gemm3m
-
Guillaume Sylvand authored
IT is OFF by default It is activated with MORSE_Enable(MORSE_PROGRESS) In the timing routines, it is activated with --progress No progress is printed for tasks faster than 10 seconds
-
- Sep 09, 2016
-
-
PRUVOST Florent authored
-
- Sep 07, 2016
-
-
Guillaume Sylvand authored
-
- Oct 05, 2015
-
-
Mathieu Faverge authored
-
- Sep 29, 2015
-
-
PRUVOST Florent authored
-
- Sep 28, 2015
-
-
THIBAULT Samuel authored
-
- Sep 17, 2015
-
-
THIBAULT Samuel authored
instead of introducing RUNTIME_distributed_barrier
-
- Sep 16, 2015
-
-
THIBAULT Samuel authored
MORSE_Distributed_size, MORSE_Distributed_rank so that applications do not hardcode the use of MPI. Introduce RUNTIME_distributed_rank, RUNTIME_distributed_size, RUNTIME_distributed_barrier, so that MORSE does not hardcode the use of MPI either. This allows to use simgrid-mpi.
-
- Jul 28, 2015
-
-
PRUVOST Florent authored
-
- Feb 05, 2015
-
-
PRUVOST Florent authored
-
PRUVOST Florent authored
change the way we include our own header files --> relative to the root - when plasma is in the same env, chameleon can take some headers not belonging to it (ex: #include descriptor.h, this file states in plasma install dir also) which make compilation errors
-
- Nov 19, 2014
-
-
PRUVOST Florent authored
change copyright - correct whitespace - place cmake module depending on chameleon in cmake_modules and no more in cmake_modules/morse
-
- Nov 16, 2014
-
-
PRUVOST Florent authored
-
PRUVOST Florent authored
-
PRUVOST Florent authored
-
PRUVOST Florent authored
-
PRUVOST Florent authored
-