chameleon-1.3.0
------------------------------------------------------------------------
 - mixed-precision: introduce descriptor with precision adapted to local norms
        - Add CHAMELEON_[dz]gered... functions to reduce the precision of the tiles based on a requested accuracy
        - Add CHAMELEON_[dz]gerst... functions to restore the original numerical precision of the tiles in a descriptor
 - types: add support for half precision arithmetic into the data descriptors
 - cuda: add half precision conversion kernels, and variants of the gemm kernels (hgemm, and gemmex)
 - descriptors: Add the possibility to pass arguments to the rankof
   function. This is used to provide custom distribuitions through a
   given file. *WARNING*: It changes the interface of
   CHAMELEON_Desc_Create_User that requires aan additional `, NULL`
   parameters in the general case.
 - compute/poinv: Add the possibility to use an intermediate distribution for the TRTRI operation
 - compute/getrf_nopiv: Add lookahead through temporary buffers to better regulate the communication allocations
 - runtime/starpu: Whenever possible replace the lacpy codelet by a direct memory copy from the input handler to the output one

chameleon-1.2.0
------------------------------------------------------------------------
 - NEW: Add support for AMD GPUs throug hipcublas or hip-rocm kernels
 - NEW: Add the parallel worker support through StarPU (only in potrf for now)
 - NEW: Add support for H-Matrices computration through the HMat-OSS library
 - api/lapack: add cblas/lapack interface routines for BLAS 3, and Cholesky family functions
 - benchmarks: Fix NewMad weekly benchmarks
 - ci: Allows for MR without tests though the notest- prefix
 - compute/gemm: Add the commute flag when beta is 1.0 (StarPU only)
 - compute/gemm: allows for simpler switch from generic gemm to summa, or A stationnary variants
 - compute/map: Add the acces mode to the data used in map to enable read or write only modes
 - compute/xxmm: Make sure the calls are always asynchronous even wehn workspaces are required by introducing new fuctions to create and destroy these workspaces outside the Async interface.
 - compute/zhemm: Fix ChamConjTrans instead of ChamTrans
 - compute: Add a plgtr funcion to initialize trapezoidal matrix with the std api
 - compute: Add a xprint function to help with numerical issue debugging
 - compute: Enable the use of descriptors of integer matrices.
 - compute: Factorize QR/LQ steps calls to ease the propagation of fix to all QR/LQ/SVD algorithms.
 - context: simplify the definition of the environment variables
 - control: Restore MPI_THREAD_MULTIPLE removed by !282 as SERIALIZED seems to be insufficient to manage the datatypes
 - control: make the descriptor helpers functions (rankof, dataof, ...) public
 - control; Rename CHAMELEON_{KERNELPROFILE,PROFILING}_MODE to CHAMELEON_GENERATE_{STATS,TRACE}
 - cuda: Fix CUDA_zparfb. The setting to 0 of the lower part of V when L>0 was incorrect
 - cuda: Remove support of cublas v1 to fix compilation with cuda >=12
 - documentation: various updates on the compilation and installation steps as well as on the new algorithms
 - gitignore: add python and vscode lists
 - pkgconfig: Update the .pc file generation
 - runtime/starpu: Change the tag management system to reduce the number of tags used (Remove the CHAMELEON_user_tag_size() and RUNTIME_set_tag_sizes() functions)
 - runtime/starpu: Do not attempt to shutdown StarPU if initialization failed
 - runtime/starpu: Update SimGrid performance model to match 1.4 footprints calculation
 - runtime/starpu: Upgrade StarPU support to 1.4
 - testings: Fix flops computations of several testing_*
 - testings: Large refactorization of the tests to check both standard and Asynchronous API. The standard API adds the possibility to evaluate MKL (or other BLAS library) perforamnce though the same set of tests.
 - testings: allows for on demand to validate the checks.
 - update submodules
 - Fix many issues reported by coverity
 - Fix many issues reported by sonarqube

chameleon-1.1.0
------------------------------------------------------------------------
- ci: Update the docker image to a local storage (!248)
- feature: add extended asynchronous gemm (possibility to create the workspace outside the function to make it really asynchronous) (!244)
- starpu/codelets: Add additional const where its needed and missing callbacks (!246)
- starpu: add support to peek data (!242)
- starpu: add datatype registration (!241)
- ci/guix: update the guix recipe (!245)
- cmake: Move to modern cmake support (!236)
- feature: Add QDWH Polar decomposition algorithm (!125)
- feature: Add a two-norm estimator algorithm (!227)
- feature: Add a latms (LAPACK Test Matrices) algorithm (!226)
- feature: flops.h is now installed with the library to provide flops computation to the user
- profiling: Fix profile initialization/generation with StarPU
- bugfix: Fix issue in ungqr/unglq family functions for corner cases (!233)
- ci: Integrate NMad into the benchmark suite
- testing: (Re-) introduce the forcegpu option to force kernels on the CPU/GPU when possible
- cmake: Allow to disable the TMG requirement when not necessary
- feature: Add a rank-k matrix generator: plrnk (!220)
- simgrid: Update the kernels sampling and the associated documentation
- bugfix: RUNTIME_Desc_flush can now be called on submatrices
- starpu,quark/codelets: Reduce as much as possible the data accesses when possible (!215,!216)
- starpu/codelets: add the possibility to execute on a specific worker (!211)
- feature: `lapack_to_tile` and `tile_to_lapack` functions are now *deprecated* and replaced by `Desc2Lap/Lap2Desc` family (!199,#94,#96).

chameleon-1.0.0
------------------------------------------------------------------------
- Testings: Restructuration of the testing/timing drivers
   - Remove former testings and timings directories
   - Integrate a new testing structure that both time and check the numerical accuracy of the functions
   - Integrate kibana testings to follow performance evolution
- Testings: Use the number of core available as default
- Switch to a new tile interface to be more flexible about the data structure behind the tiles
- Functions: Add graam function
- Functions: Add SUMMA-like gemm/hemm/symm operations
- Functions: Fix many corner cases discovered with the new testing interface
- Functions: Fix invalid data allocation of temporary data for QR/LQ functions
- Add the possibility to synchronize the task submission for debugging purpose if provided by the runtime (CHAMELEON_RUNTIME_SYNC)
- StarPU: upgrade requirement to 1.3
- StarPU: Fix conflict between environment variable and chameleon_init.
- StarPU/simGrid: Update performance models
- Traces: Upgrade of the EZtrace module for the new interface
- Traces: Removal of the CHAMELEON_ENABLE_TRACING option
- Upgrade cmake_module
  - Integrate new precision generation scripts (Support for python 3)
  - Integrate CMAKE_BUILD_TYPE list with sanitizers detection
- C++: Replace occurrences of the specific operator keyword
- Guix: Add guix support
- Fix issues reported by coverity

chameleon-0.9.2
------------------------------------------------------------------------
- chameleon is now hosted on gitlab: https://gitlab.inria.fr/solverstack/chameleon
- MAGMA kernels are no longer supported in Chameleon
- Add SVD/EVD drivers based on parallel first stage, and sequential LAPACK second stage and solve
- Add First stage algorithm fo r the SVD/EVD solvers
- add timing drivers time_zpotrs_tile and time_zgeqrs_tile
- deactivate warm-up by default
- add an org-mode user guide documentation, see in doc/orgmode/

chameleon-0.9.1
------------------------------------------------------------------------
r2253 | fpruvost | 2015-06-19 16:57:27 +0200 (ven. 19 juin 2015)

- update users_guide
- activate MAGMA kernels for simulation mode
- magma_ztsqrt_gpu is now defined in magma-1.6.0: change the name of our magma routine in magma_ztsqrt2_gpu
- avoid problem of compatibility with MAGMA with lapack_const
- add cublas interface v2 (not used for now because starpu does not manage cublas handles)
- use WORKC workspace for better performances in tsmqr
- add lapacke headers
- enable to use lapacke in lapack (MKL)
- correct restrict zsytrf_nopiv
- save config of build in config.log
- improve potrf+potrs on distributed systems
- add a CHAMELEON_VERBOSE mode to activate or not hints during the detection
- add the codelet name information in starpu_codelet for eztrace starpu module interceptions
- change the name of the installed chameleon .pc file: no more starpu or quark suffix
- change installation directories for headers, executables and docs. Make it relative to chameleon to avoid a bloody mess in system dirs
- add an example to link with chameleon in cmake using FindCHAMELEON.cmake
- update intern cmake Finds
  - recursive cmake finds,
  - add link tests
  - MKLROOT env var. is considered
  - env var. for libraries dir. are considered (ex: export HWLOC_DIR=/home/toto/install/hwloc)
  - improve Find BLAS for gnu compilo and threaded mkl
  - if hints are given by user to find libs (CMake option or env. var) --> do not use pkg-config
- avoid to call MPI_Finalize if MPI has been initialized by user
- add CHAMELEON_Pause/Resume function to avoid CPU consumption when no tasks have to be executed
- update the fortran90 interface

chameleon-0.9.0
------------------------------------------------------------------------
r2019 | pruvost | 2014-11-16 18:43:03 +0100 (Sun, 16 Nov 2014)

- First release of chameleon