TODO.org 4.25 KB
Newer Older
MIJIEUX Thomas committed
1 2 3
#+TITLE: fabulous todo-list
#+STARTUP: showeverything

4 5 6
* DONE Install Chameleon
** DONE Install lapacke/cblas
** DONE Install Starpu
MIJIEUX Thomas committed
7 8 9 10 11 12 13
* DONE Implement Classic version with TOP LEVEL Chameleon Interface
* DONE Implement QR version with LOW LEVEL (_Tile_Async) Chameleon Interface
* DONE Make document about Deflated Restarting and Inexact Breakdown's blocks layout
** DONE Deflated Restarting with Incremental QR factorization: Hessenberg blocks layout
** DONE Inexact breakdown block layout (Hessenberg and Base layout)
** DONE Inexact breakdown on R0 (Hessenberg and RHS layout)
* DONE Implement DR+QR
14
* DONE Reproducible installation
MIJIEUX Thomas committed
15
** DONE Instruction in INSTALL.org
16
** DONE Update spack package
17
** DONE merge request to make Chameleon headers compatible with C++ complex types
MIJIEUX Thomas committed
18 19 20
* TODO reproducible results and visualization
** TODO Improve Logging
*** DONE Add timer for iterations (global)
21
*** DONE Add timer for orthogonalization, least square and matrix vector product
22 23
*** DONE Add global timer
*** TODO Improve timer semantics and logs
MIJIEUX Thomas committed
24 25
*** DONE Fix problem of least square time measure in QR+DR version; IBQRDR
    CLOSED: [2017-05-14 Sun 04:28]
26
    (actual factorization if performed in notify_factorization_end() )
MIJIEUX Thomas committed
27
*** TODO Add flops/s counter
MIJIEUX Thomas committed
28 29
**** DONE orthogonalization flops counter
     CLOSED: [2017-05-14 Sun 04:28]
30
*** TODO print flops
31
** DONE Improve RESULTS.org
32 33
   Eventually, anyone must be able to gather all interesting results into RESULTS.org
   just by evaluating code blocks from RESULTS.org and/or tangling RESULTS.org
MIJIEUX Thomas committed
34
* DONE Implement IB+DR
35
* DONE Implement IB+DR+QR
36
** tpqrt and tpmqrt kernels
37 38
*** DONE fix LAPACKE_?tpmqrt workspace allocation bug
* TODO Implement GCR Algorithm
MIJIEUX Thomas committed
39 40
** DONE basic implentation of GCR algorithm
   CLOSED: [2017-05-30 Tue 16:20]
MIJIEUX Thomas committed
41
* TODO iterated orthogonalization stop criterion
MIJIEUX Thomas committed
42 43 44
* DONE find out how to link fabulous with parallel mkl with spack
  CLOSED: [2017-05-22 Mon 16:23]
  see [[file:NOTES.org::*fabulous%20linking%20with%20lapacke/cblas%20kernels][fabulous linking with lapacke/cblas kernels]]
45
* TODO parallel(distributed) test case
46
** test with maphys
47 48 49 50 51 52 53
   see [[file:LABBOOK.org::*integrate%20and%20test%20latest%20fabulous%20api%20in%20a%20maphys%20fork][integrate and test latest fabulous api in a maphys fork]]
* DONE error handling system
  CLOSED: [2017-05-30 Tue 15:53]
* DONE report convergence in logger
  CLOSED: [2017-05-30 Tue 15:53]
* DONE review chameleon sub descriptor solution
  CLOSED: [2017-05-30 Tue 15:53]
54
* TODO distributed hessenberg (with chameleon)
MIJIEUX Thomas committed
55
* TODO rework C++ api for multiple algorithm
56 57
* DONE Add parameter for setting maximum base extension in IB
  CLOSED: [2017-06-06 Tue 14:02]
58 59 60 61 62
* TODO Remove Tile_to_Lapack and Lapack_to_Tile in ChamQR_submat version
  The reason is that Tile_to_Lapack and Lapack_to_Tile create descriptor and
  use chameleon mpi tags. Eventually chameleon run out of mpi tag.

  The solution is to create ourself a descriptor with MorseCM memory
63 64 65
  layout(lapack style) by using the morse_getaddr_cm() and morse_getblkld_cm() callback
* DONE check api logs bug
  CLOSED: [2017-06-06 Tue 14:03]
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

* TODO improve ib+dr incremental qr algorithm
** Last Block line factorization
*** dynamic intelligent TT/TS switch
    use TT with diagonal blocks when no inexact breakdown happened yet
*** minimal column span for ts
    only compute from the column when inexact breakdown started to occur
    (column before are already ZEROED)
*** tiled ts for task parallelism
    TS with diagonal tile;
    apply all TSMQR on right part of H_j (all of these can be made in parallel)
    TS  on next diagonal tile (can be started as soon as TSMQR from left part was finished)
    ... go on
** pre-solve factorization
*** minimum recopy at each step
    only recopy the diagonal block each time before for pre-solve factorization
*** all points from Last Block line factorization apply here too
    of course
** Double update mechanism
*** only make a double update when strictly needed
    not needed when:
    - No IB on R0
    - No restart
*** intelligent switch to disable Incremental QR when double update?
    check if worth
*** intelligent switch to disable Incremental QR when IB on R0?
    check if worth
    -> matrix is fully dense, so costly TSMQR may be problematic
** Task based (chameleon-like) implementation of previous point
   over starpu