Chameleon: A dense linear algebra software for heterogeneous architectures
Chameleon is a C library providing parallel algorithms to perform BLAS/LAPACK operations exploiting fully modern architectures.
Chameleon dense linear algebra software relies on sequential task-based algorithms where sub-tasks of the overall algorithms are submitted to a Runtime system. Such a system is a layer between the application and the hardware which handles the scheduling and the effective execution of tasks on the processing units. A Runtime system such as StarPU is able to manage automatically data transfers between not shared memory area (CPUs-GPUs, distributed nodes).
This kind of implementation paradigm allows to design high performing linear algebra algorithms on very different type of architecture: laptop, many-core nodes, CPUs-GPUs, multiple nodes. For example, Chameleon is able to perform a Cholesky factorization (double-precision) at 80 TFlop/s on a dense matrix of order 400 000 (i.e. 4 min). Chameleon is a sub-project of MORSE specifically dedicated to dense linear algebra.
1 Get Chameleon
To use last development states of Chameleon, please clone the master
branch. Note that Chameleon contains a
git submodule morse_cmake.
To get sources please use these commands:
# if git version >= 1.9 git clone --recursive firstname.lastname@example.org:solverstack/chameleon.git cd chameleon # else git clone email@example.com:solverstack/chameleon.git cd chameleon git submodule init git submodule update
Last releases of Chameleon are hosted on the gforge.inria.fr for now. Future releases will be available on this gitlab project.
There is no up-to-date documentation of Chameleon. We would like to provide a doxygen documentation hosted on gitlab in the future. Please refer to the section 2.1 of READMEDEV to get information about the documentation generation.
The documentation of Chameleon’s last release is available here: chameleon-0.9.1 documentation.
2.1 For developers
3.1 Distribution of Chameleon
To get support to install a full distribution (Chameleon + dependencies) we encourage users to use the morse branch of Spack.
Please read these documentations:
3.1.1 Usage example for a simple distribution of Chameleon
git clone https://github.com/solverstack/spack.git . ./spack/share/spack/setup-env.sh spack install -v chameleon # chameleon is installed here: `spack location -i chameleon`
3.2 Build and install with CMake
Chameleon can be built using CMake. This installation requires to have some library dependencies already installed on the system.
Please refer to chameleon-0.9.1 to get configuration information.
4 Get involved!
4.1 Mailing list
To contact the developers send an email to firstname.lastname@example.org
First, since the Chameleon library started as an extension of the PLASMA library to support multiple runtime systems, all developpers of the PLASMA library are developpers of the Chameleon library.
The following people contributed to the development of Chameleon:
- Emmanuel Agullo, PI
- Olivier Aumage
- Cedric Castagnede
- Terry Cojean
- Mathieu Faverge, PI
- Nathalie Furmento
- Reazul Hoque
- Hatem Ltaief
- Gregoire Pichon
- Florent Pruvost, PI
- Marc Sergent
- Guillaume Sylvand
- Samuel Thibault
- Stanimire Tomov
- Omar Zenati
If we forgot your name, please let us know that we can fix that mistake.
6 Citing Chameleon
Feel free to use the following publications to reference Chameleon:
- Original paper that initiated Chameleon and the principles:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Ltaief, Hatem and Namyst, Raymond and Thibault, Samuel and Tomov, Stanimire, Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, First Online: 17 December 2010.
- Design of the QR algorithms:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Faverge, Mathieu and Ltaief, Hatem and Thibault, Samuel an Tomov, Stanimire, QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 25th IEEE International Parallel & Distributed Processing Symposium, First Online: 16 December 2010.
- Design of the LU algorithms:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Faverge, Mathieu and Langou, Julien and Ltaief, Hatem and Tomov, Stanimire, LU Factorization for Accelerator-based Systems, 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), First Online: 21 December 2011.
- Regarding distributed memory:
- Agullo, Emmanuel and Aumage, Olivier and Faverge, Mathieu and Furmento, Nathalie and Pruvost, Florent and Sergent, Marc and Thibault, Samuel, Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, Research Report, First Online: 16 June 2016.