Chameleon is written in C and depends on a couple of external libraries that must be installed on the system.
Chameleon can be built and installed on UNIX systems (Linux) by the standard means of CMake. General information about CMake, as well as installation binaries and CMake source code are available from here.
To get support to install a full distribution Chameleon + dependencies we encourage users to use Spack.
Getting Chameleon
The latest official release tarballs of Chameleon sources are available for download from the gitlab tags page.
The latest development state is available on gitlab. You need Git
git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git
Prerequisites for installing Chameleon
To install Chameleon’s libraries, header files, and executables, one needs:
- CMake (version 2.8 minimum): the build system
- C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM can be used
- python: to generate files in the different precisions
- external libraries: this depends on the configuration, by default the required libraries are
Optional libraries:
These packages must be installed on the system before trying to configure/build chameleon. Please look at the distrib/ directory which gives some hints for the installation of dependencies for Unix systems.
We give here some examples for a Debian system:
# Update Debian packages list sudo apt-get update # Install OpenBLAS sudo apt-get install -y libopenblas-dev liblapacke-dev # Install OpenMPI sudo apt-get install -y libopenmpi-dev # Install StarPU sudo apt-get install libstarpu-dev # Optionnaly to make some specific developments, the following may be installed # Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI) sudo apt-get install -y libhwloc-dev # install EZTrace, usefull to export some nice execution traces with all runtimes sudo apt-get install -y libeztrace-dev # install FxT, usefull to export some nice execution traces with StarPU sudo apt-get install -y libfxt-dev # Install cuda and cuBLAS: only if you have a GPU cuda compatible sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev # If you prefer a specific version of StarPU, install it yourself, e.g. # Install StarPU (with MPI and FxT enabled) mkdir -p $HOME/install cd $HOME/install wget https://files.inria.fr/starpu/starpu-1.3.7/starpu-1.3.7.tar.gz tar xvzf starpu-1.3.7.tar.gz cd starpu-1.3.7/ ./configure --prefix=/usr/local --with-fxt=/usr/lib/x86_64-linux-gnu/ make -j5 sudo make install # Install PaRSEC: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install git clone https://bitbucket.org/mfaverge/parsec.git cd parsec git checkout mymaster git submodule update mkdir -p build cd build cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON make -j5 sudo make install # Install QUARK: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install git clone https://github.com/ecrc/quark cd quark/ sed -i -e "s#prefix=.*#prefix=/usr/local#g" make.inc sed -i -e "s#CFLAGS=.*#CFLAGS= -O2 -DADD_ -fPIC#g" make.inc make sudo make install
Known issues
- we need the lapacke interface to tmg routines and symbol like
LAPACKE_dlatms_work
should be defined in the lapacke library. The Debian packages libopenblas-dev and liblapacke-dev (version 1.0.0) do not provide the tmg interface. Please update your distribution or install the lapacke interface library in another way, by yourself from source or with Spack, or with Guix-HPC,… - sometimes parallel make with -j can fails due to undefined dependencies between some targets. Try to invoke the make command several times if so.
Some details about dependencies
BLAS implementation
BLAS (Basic Linear Algebra Subprograms), are a de facto standard for basic linear algebra operations such as vector and matrix multiplication. FORTRAN implementation of BLAS is available from Netlib. Also, C implementation of BLAS is included in GSL (GNU Scientific Library). Both these implementations are reference implementation of BLAS, are not optimized for modern processor architectures and provide an order of magnitude lower performance than optimized implementations. Highly optimized implementations of BLAS are available from many hardware vendors, such as Intel MKL, IBM ESSL and AMD ACML. Fast implementations are also available as academic packages, such as ATLAS and OpenBLAS. The standard interface to BLAS is the FORTRAN interface.
Caution about the compatibility: Chameleon has been mainly tested with the reference BLAS from NETLIB, OpenBLAS and Intel MKL.
CBLAS
CBLAS is a C language interface to BLAS. Most commercial and academic implementations of BLAS also provide CBLAS. Netlib provides a reference implementation of CBLAS on top of FORTRAN BLAS (Netlib CBLAS). Since GSL is implemented in C, it naturally provides CBLAS.
Caution about the compatibility: Chameleon has been mainly tested with the reference CBLAS from NETLIB, OpenBLAS and Intel MKL.
LAPACK implementation
LAPACK (Linear Algebra PACKage) is a software library for numerical linear algebra, a successor of LINPACK and EISPACK and a predecessor of Chameleon. LAPACK provides routines for solving linear systems of equations, linear least square problems, eigenvalue problems and singular value problems. Most commercial and academic BLAS packages also provide some LAPACK routines.
Caution about the compatibility: Chameleon has been mainly tested with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL.
LAPACKE
LAPACKE is a C language interface to LAPACK (or CLAPACK). It is produced by Intel in coordination with the LAPACK team and is available in source code from Netlib in its original version (Netlib LAPACKE) and from Chameleon website in an extended version (LAPACKE for Chameleon). In addition to implementing the C interface, LAPACKE also provides routines which automatically handle workspace allocation, making the use of LAPACK much more convenient.
Caution about the compatibility: Chameleon has been mainly tested
with the reference LAPACKE from NETLIB, OpenBLAS and Intel
MKL. In addition the LAPACKE library must be configured to
provide the interface with the TMG routines and symbols like
LAPACKE_dlatms_work
should be defined.
libtmg
libtmg is a component of the LAPACK library, containing routines for generation of input matrices for testing and timing of LAPACK. The testing and timing suites of LAPACK require libtmg, but not the library itself. Note that the LAPACK library can be built and used without libtmg.
Caution about the compatibility: Chameleon has been mainly tested with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL.
StarPU
StarPU is a task programming library for hybrid architectures. StarPU handles run-time concerns such as:
- Task dependencies
- Optimized heterogeneous scheduling
- Optimized data transfers and replication between main memory and discrete memories
- Optimized cluster communications
StarPU can be used to benefit from GPUs and distributed-memory environment. Note StarPU is enabled by default.
Caution about the compatibility: Chameleon has been mainly tested with StarPU-1.1 and 1.2 releases.
PaRSEC
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. It can be used with MPI and Cuda.
Caution about the compatibility: Chameleon is compatible with this version https://bitbucket.org/mfaverge/parsec/branch/mymaster.
QUARK
QUARK (QUeuing And Runtime for Kernels) provides a library that enables the dynamic execution of tasks with data dependencies in a multi-core, multi-socket, shared-memory environment. When Chameleon is linked with QUARK, it is not possible to exploit neither CUDA (for GPUs) nor MPI (distributed-memory environment). You can use PaRSEC or StarPU to do so.
Caution about the compatibility: Chameleon has been mainly tested with the QUARK library coming from https://github.com/ecrc/quark.
EZTrace
This library provides efficient modules for recording traces. Chameleon can trace kernels execution on CPU workers thanks to EZTrace and produce .paje files. EZTrace also provides integrated modules to trace MPI calls and/or memory usage. See how to use this feature here Execution trace using EZTrace. To trace kernels execution on all kind of workers, such as CUDA, We recommend to use the internal tracing support of the runtime system used done by the underlying runtime. See how to use this feature here Execution trace using StarPU/FxT.
hwloc
hwloc (Portable Hardware Locality) is a software package for
accessing the topology of a multicore system including components
like: cores, sockets, caches and NUMA nodes. The topology
discovery library, hwloc
, is strongly recommended to be used
through the runtime system. It allows to increase performance,
and to perform some topology aware scheduling. hwloc
is available
in major distributions and for most OSes and can be downloaded
from http://www.open-mpi.org/software/hwloc.
Caution about the compatibility: hwlov should be compatible with the runtime system used.
OpenMPI
OpenMPI is an open source Message Passing Interface implementation for execution on multiple nodes with distributed-memory environment. MPI can be enabled only if the runtime system chosen is StarPU (default). To use MPI through StarPU, it is necessary to compile StarPU with MPI enabled.
Caution about the compatibility: OpenMPI should be built with the –enable-mpi-thread-multiple option.
Nvidia CUDA Toolkit
Nvidia CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. Chameleon can use a set of low level optimized kernels coming from cuBLAS to accelerate computations on GPUs. The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the Nvidia CUDA runtime. cuBLAS is normaly distributed with Nvidia CUDA Toolkit. CUDA/cuBLAS can be enabled in Chameleon only if the runtime system chosen is StarPU (default). To use CUDA through StarPU, it is necessary to compile StarPU with CUDA enabled.
Caution about the compatibility: Chameleon has been mainly tested with CUDA releases from versions 4 to 7.5. Your compiler must be compatible with CUDA.
Distribution of Chameleon using GNU Guix
<sec:guix>
We provide Guix packages to install Chameleon with its dependencies in a reproducible way on GNU/Linux systems. For MacOSX please refer to the next section about Spack packaging.
If you are “root” on the system you can install Guix and directly use it to install the libraries. On supercomputers your are not root on you may still be able to use it if Docker or Singularity are available on the machine because Chameleon can be packaged as Docker/Singularity images with Guix.
Installing Guix
Guix requires a running GNU/Linux system, GNU tar and Xz.
gpg --keyserver pgp.mit.edu --recv-keys 3CE464558A84FDC69DB40CFB090B11993D9AEBB5
wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.sh
The Chameleon packages are not official Guix packages. It is then necessary to add a channel to get additional packages. Create a ~/.config/guix/channels.scm file with the following snippet:
(cons (channel (name 'guix-hpc-non-free) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")) %default-channels)
Update guix package definition
guix pull
Update new guix in the path
PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH"
hash guix
For further shell sessions, add this to the ~/.bash_profile file
export PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH" export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"
Chameleon packages are now available
guix search ^chameleon
Refer to the official documentation of Guix to learn the basic commands.
Installing Chameleon with Guix
Standard Chameleon, last release
guix install chameleon
Notice that there exist several build variants
- chameleon (default) : with starpu - with mpi
- chameleon-mkl-mt : default version but with Intel MKL multithreaded to replace OpenBlas
- chameleon-cuda : with starpu - with mpi - with cuda
- chameleon-simgrid : with starpu - with mpi - with simgrid
- chameleon-openmp : with openmp - without mpi
- chameleon-parsec : with parsec - without mpi
- chameleon-quark : with quark - without mpi
Change the version
guix install chameleon --with-branch=chameleon=master
guix install chameleon --with-commit=chameleon=b31d7575fb7d9c0e1ba2d8ec633e16cb83778e8b
guix install chameleon --with-git-url=chameleon=https://gitlab.inria.fr/fpruvost/chameleon.git
guix install chameleon --with-git-url=chameleon=$HOME/git/chameleon