Replace tile pointers by CHAM_tile data structure
This PR aims at replacing the couples (A, lda) by a data structure CHAM_tile_t that can be used to hide different kind of tiles:
- Full-rank matrices
- Low-rank matrices in any format
- A descriptor itself to do recursive algorithm
typedef struct chameleon_tile_s {
int m, n, ld;
void *mat;
} CHAM_tile_t;
In the future, a format field will be added, to switch between formats, and the runtime handle will be stored within this structure instead of within the matrix descriptor.
Former INSERT_TASK functions:
void
INSERT_TASK_zgemm( const RUNTIME_option_t *options,
cham_trans_t transA, cham_trans_t transB,
int m, int n, int k, int nb,
CHAMELEON_Complex64_t alpha, const CHAM_desc_t *A, int Am, int An, int lda,
const CHAM_desc_t *B, int Bm, int Bn, int ldb,
CHAMELEON_Complex64_t beta, const CHAM_desc_t *C, int Cm, int Cn, int ldc );
is replaced by
void
INSERT_TASK_zgemm( const RUNTIME_option_t *options,
cham_trans_t transA, cham_trans_t transB,
int m, int n, int k, int nb,
CHAMELEON_Complex64_t alpha, const CHAM_desc_t *A, int Am, int An,
const CHAM_desc_t *B, int Bm, int Bn,
CHAMELEON_Complex64_t beta, const CHAM_desc_t *C, int Cm, int Cn );
With this change, comes another one in the kernel. A extra layer has been added to switch between the different kind of tiles if needed in coreblas/compute/core_ztile.c
. For now, however, only the classic full-rank interface is available, however the layer is already called to simplify the work on hierarchical descriptors and h-matrices.
All codelets from Quark, StarPU and OpenMP have thus been modified to use this layer instead of the classic CORE_z layer. This change has not been done in PaRSEC, and thus advanced matrix format will not be available with PaRSEC as long as it stays like this.
Regarding StarPU, to handle this new format in distributed, a cham_tile_interface has been added to handle the packing/unpacking of the data and support this new type of information. This is the major change in the runtime directory.
@all I do not recommend to read the full changes, but at least please have a look at the changes in all files except compute/*
, and runtime/*/codelets/
. All changes outside these directories are minimal and should be proofread, and if you can test this PR, especially with GPUs and OOC. That would be great. Thanks.
Edit: I have also directly added the format field in the CHAM_tile_t structure to check the format and limit the conflicts between parallel developments. All TCORE interfaces include assert that the tiles are in the full-rank format in order to avoid problems when developing new functionality.