-
PRUVOST Florent authoredPRUVOST Florent authored
Chameleon is written in C and depends on a couple of external libraries that must be installed on the system.
Chameleon can be built and installed on UNIX systems (Linux) by the standard means of CMake. General information about CMake, as well as installation binaries and CMake source code are available from here.
To get support to install a full distribution Chameleon + dependencies we encourage users to use Spack.
Getting Chameleon
The latest official release tarballs of Chameleon sources are available for download from the gitlab tags page.
The latest development state is available on gitlab. You need Git
git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git
Prerequisites for installing Chameleon
To install Chameleon’s libraries, header files, and executables, one needs:
- CMake (version 2.8 minimum): the build system
- C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM can be used
- python: to generate files in the different precisions
- external libraries: this depends on the configuration, by default the required libraries are
Optional libraries:
These packages must be installed on the system before trying to configure/build chameleon. Please look at the distrib/ directory which gives some hints for the installation of dependencies for Unix systems.
We give here some examples for a Debian system:
# Update Debian packages list sudo apt-get update # Install BLAS/LAPACK, can be OpenBLAS, Intel MKL, Netlib LAPACK sudo apt-get install -y libopenblas-dev liblapacke-dev # or sudo apt-get install -y libmkl-dev # or sudo apt-get install -y liblapack-dev liblapacke-dev # Install OpenMPI sudo apt-get install -y libopenmpi-dev # Install StarPU sudo apt-get install libstarpu-dev # Optionnaly to make some specific developments, the following may be installed # Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI) sudo apt-get install -y libhwloc-dev # install EZTrace, usefull to export some nice execution traces with all runtimes sudo apt-get install -y libeztrace-dev # install FxT, usefull to export some nice execution traces with StarPU sudo apt-get install -y libfxt-dev # Install cuda and cuBLAS: only if you have a GPU cuda compatible sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev # If you prefer a specific version of StarPU, install it yourself, e.g. # Install StarPU (with MPI and FxT enabled) mkdir -p $HOME/install cd $HOME/install wget https://files.inria.fr/starpu/starpu-1.3.7/starpu-1.3.7.tar.gz tar xvzf starpu-1.3.7.tar.gz cd starpu-1.3.7/ ./configure --prefix=/usr/local --with-fxt=/usr/lib/x86_64-linux-gnu/ make -j5 sudo make install # Install PaRSEC: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install git clone https://bitbucket.org/mfaverge/parsec.git cd parsec git checkout mymaster git submodule update mkdir -p build cd build cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON make -j5 sudo make install # Install QUARK: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install git clone https://github.com/ecrc/quark cd quark/ sed -i -e "s#prefix=.*#prefix=/usr/local#g" make.inc sed -i -e "s#CFLAGS=.*#CFLAGS= -O2 -DADD_ -fPIC#g" make.inc make sudo make install
See also our script example in the distrib/debian sub-directory.
Known issues
- we need the lapacke interface to tmg routines and symbol like
LAPACKE_dlatms_work
should be defined in the lapacke library. Make sure the Debian packages libopenblas-dev and liblapacke-dev (no problem with Intel MKL) do provide the tmg interface. If not you can possibly update your distribution or install the lapacke interface library in another way, by yourself from source or with Spack, or with Guix-HPC,…
Some details about dependencies
BLAS implementation
BLAS (Basic Linear Algebra Subprograms), are a de facto standard for basic linear algebra operations such as vector and matrix multiplication. FORTRAN implementation of BLAS is available from Netlib. Also, C implementation of BLAS is included in GSL (GNU Scientific Library). Both these implementations are reference implementation of BLAS, are not optimized for modern processor architectures and provide an order of magnitude lower performance than optimized implementations. Highly optimized implementations of BLAS are available from many hardware vendors, such as Intel MKL, IBM ESSL and AMD ACML. Fast implementations are also available as academic packages, such as ATLAS and OpenBLAS. The standard interface to BLAS is the FORTRAN interface.
Caution about the compatibility: Chameleon has been mainly tested with the reference BLAS from NETLIB, OpenBLAS and Intel MKL.
CBLAS
CBLAS is a C language interface to BLAS. Most commercial and academic implementations of BLAS also provide CBLAS. Netlib provides a reference implementation of CBLAS on top of FORTRAN BLAS (Netlib CBLAS). Since GSL is implemented in C, it naturally provides CBLAS.
Caution about the compatibility: Chameleon has been mainly tested with the reference CBLAS from NETLIB, OpenBLAS and Intel MKL.
LAPACK implementation
LAPACK (Linear Algebra PACKage) is a software library for numerical linear algebra, a successor of LINPACK and EISPACK and a predecessor of Chameleon. LAPACK provides routines for solving linear systems of equations, linear least square problems, eigenvalue problems and singular value problems. Most commercial and academic BLAS packages also provide some LAPACK routines.
Caution about the compatibility: Chameleon has been mainly tested with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL.
LAPACKE
LAPACKE is a C language interface to LAPACK (or CLAPACK). It is produced by Intel in coordination with the LAPACK team and is available in source code from Netlib in its original version (Netlib LAPACKE) and from Chameleon website in an extended version (LAPACKE for Chameleon). In addition to implementing the C interface, LAPACKE also provides routines which automatically handle workspace allocation, making the use of LAPACK much more convenient.
Caution about the compatibility: Chameleon has been mainly tested
with the reference LAPACKE from NETLIB, OpenBLAS and Intel
MKL. In addition the LAPACKE library must be configured to
provide the interface with the TMG routines and symbols like
LAPACKE_dlatms_work
should be defined.
libtmg
libtmg is a component of the LAPACK library, containing routines for generation of input matrices for testing and timing of LAPACK. The testing and timing suites of LAPACK require libtmg, but not the library itself. Note that the LAPACK library can be built and used without libtmg.
Caution about the compatibility: Chameleon has been mainly tested with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL.
StarPU
StarPU is a task programming library for hybrid architectures. StarPU handles run-time concerns such as:
- Task dependencies
- Optimized heterogeneous scheduling
- Optimized data transfers and replication between main memory and discrete memories
- Optimized cluster communications
StarPU can be used to benefit from GPUs and distributed-memory environment. Note StarPU is enabled by default.
Caution about the compatibility: Chameleon has been mainly tested with StarPU-1.1, 1.2 and 1.3 releases.
PaRSEC
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. It can be used with MPI and Cuda.
Caution about the compatibility: Chameleon is compatible with this version https://bitbucket.org/mfaverge/parsec/branch/mymaster.
QUARK
QUARK (QUeuing And Runtime for Kernels) provides a library that enables the dynamic execution of tasks with data dependencies in a multi-core, multi-socket, shared-memory environment. When Chameleon is linked with QUARK, it is not possible to exploit neither CUDA (for GPUs) nor MPI (distributed-memory environment). You can use PaRSEC or StarPU to do so.
Caution about the compatibility: Chameleon has been mainly tested with the QUARK library coming from https://github.com/ecrc/quark.
EZTrace
This library provides efficient modules for recording traces. Chameleon can trace kernels execution on CPU workers thanks to EZTrace and produce .paje files. EZTrace also provides integrated modules to trace MPI calls and/or memory usage. See how to use this feature here Execution trace using EZTrace. To trace kernels execution on all kind of workers, such as CUDA, We recommend to use the internal tracing support of the runtime system used done by the underlying runtime. See how to use this feature here Execution trace using StarPU/FxT.
hwloc
hwloc (Portable Hardware Locality) is a software package for
accessing the topology of a multicore system including components
like: cores, sockets, caches and NUMA nodes. The topology
discovery library, hwloc
, is strongly recommended to be used
through the runtime system. It allows to increase performance,
and to perform some topology aware scheduling. hwloc
is available
in major distributions and for most OSes and can be downloaded
from http://www.open-mpi.org/software/hwloc.
Caution about the compatibility: hwloc should be compatible with the runtime system used.
OpenMPI
OpenMPI is an open source Message Passing Interface implementation for execution on multiple nodes with distributed-memory environment. MPI can be enabled only if the runtime system chosen is StarPU (default). To use MPI through StarPU, it is necessary to compile StarPU with MPI enabled.
Caution about the compatibility: OpenMPI should be built with the –enable-mpi-thread-multiple option.
Nvidia CUDA Toolkit
Nvidia CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. Chameleon can use a set of low level optimized kernels coming from cuBLAS to accelerate computations on GPUs. The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the Nvidia CUDA runtime. cuBLAS is normaly distributed with Nvidia CUDA Toolkit. CUDA/cuBLAS can be enabled in Chameleon only if the runtime system chosen is StarPU (default). To use CUDA through StarPU, it is necessary to compile StarPU with CUDA enabled.
Caution about the compatibility: your compiler must be compatible with CUDA.
Distribution Debian
Download one of the available package for your distribution here https://gitlab.inria.fr/solverstack/chameleon/-/packages, then install as follows
sudo apt-get install ./chameleon_1.1.0-1_amd64.deb -y
Chameleon will be installed on your system meaning you can use drivers for performance tests
mpiexec -n 2 chameleon_stesting -t 2 -o gemm -n 1000
and use Chameleon library in your own project
# example usage: use chameleon library in your own cmake project (we provide a CHAMELEONConfig.cmake)
git clone https://gitlab.inria.fr/solverstack/distrib.git
cd distrib/cmake/test/chameleon && mkdir build && cd build && cmake .. && make && ./test_chameleon
# example usage: use chameleon library in your own not cmake project
# use pkg-config to get compiler flags and linking
pkg-config --cflags chameleon
pkg-config --libs chameleon
# if there are static libraries use the --static option of pkg-config
Do not hesitate to send an email if you need a package for your Debian distribution.
Distribution of Chameleon using GNU Guix
<sec:guix>
We provide Guix packages to install Chameleon with its dependencies in a reproducible way on GNU/Linux systems. For MacOSX please refer to the next sections about Brew or Spack packaging.
If you are “root” on the system you can install Guix and directly use it to install the libraries. On supercomputers your are not root on you may still be able to use it if Docker or Singularity are available on the machine because Chameleon can be packaged as Docker/Singularity images with Guix.
Installing Guix
Guix requires a running GNU/Linux system, GNU tar and Xz.
gpg --keyserver pgp.mit.edu --recv-keys 3CE464558A84FDC69DB40CFB090B11993D9AEBB5
wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.sh
The Chameleon packages are not official Guix packages. It is then necessary to add a channel to get additional packages. Create a ~/.config/guix/channels.scm file with the following snippet:
(cons (channel (name 'guix-hpc-non-free) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")) %default-channels)
Update guix package definition
guix pull
Update new guix in the path
PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH"
hash guix
For further shell sessions, add this to the ~/.bash_profile file
export PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH" export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"
Chameleon packages are now available
guix search ^chameleon
Refer to the official documentation of Guix to learn the basic commands.
Installing Chameleon with Guix
Standard Chameleon, last release
guix install chameleon
Notice that there exist several build variants
- chameleon (default) : with starpu - with mpi - with OpenBlas
- chameleon-mkl-mt : default version but with Intel MKL multithreaded to replace OpenBlas
- chameleon-cuda : with starpu - with mpi - with cuda
- chameleon-cuda-mkl-mt : with starpu - with mpi - with cuda - with Intel MKL multithreaded to replace OpenBlas
- chameleon-simgrid : with starpu - with mpi - with simgrid
- chameleon-openmp : with openmp - without mpi
- chameleon-parsec : with parsec - without mpi
- chameleon-quark : with quark - without mpi
Change the version
guix install chameleon --with-branch=chameleon=master
guix install chameleon --with-commit=chameleon=b31d7575fb7d9c0e1ba2d8ec633e16cb83778e8b
guix install chameleon --with-git-url=chameleon=https://gitlab.inria.fr/fpruvost/chameleon.git
guix install chameleon --with-git-url=chameleon=$HOME/git/chameleon
Notice also that default mpi is OpenMPI and default blas/lapack is Openblas. This can be changed with a transformation option.
Change some dependencies
# install chameleon with intel mkl to replace openblas, nmad to replace openmpi and starpu with fxt
guix install chameleon --with-input=openblas=mkl --with-input=openmpi=nmad --with-input=starpu=starpu-fxt
Generate a Chameleon Docker image with Guix
To install Chameleon and its dependencies within a docker image (OpenMPI stack)
docker_chameleon=`guix pack -f docker chameleon chameleon --with-branch=chameleon=master --with-input=openblas=mkl mkl starpu hwloc openmpi openssh slurm bash coreutils inetutils util-linux procps git grep tar sed gzip which gawk perl emacs-minimal vim gcc-toolchain make cmake pkg-config -S /bin=bin --entry-point=/bin/bash`
# Load the generated tarball as a docker image
docker_chameleon_tag=`docker load --input $docker_chameleon | grep "Loaded image: " | cut -d " " -f 3-`
# Change tag name, see the existing image name with "docker images" command, then change to a more simple name
docker tag $docker_chameleon_tag guix/chameleon-tmp
Create a Dockerfile inheriting from the image (renamed
guix/chameleon
here):
FROM guix/chameleon-tmp
# Create a directory for user 1000
RUN mkdir -p /builds
RUN chown -R 1000 /builds
ENTRYPOINT ["/bin/bash", "-l"]
# Enter the image as user 1000 in /builds
USER 1000
WORKDIR /builds
ENV HOME /builds
Then create the final docker image from this docker file.
docker build -t guix/chameleon .
Test the image
docker run -it guix/chameleon
# test starpu
STARPU=`pkg-config --variable=prefix libstarpu`
mpiexec -np 4 $STARPU/lib/starpu/mpi/comm
# test chameleon
CHAMELEON=`pkg-config --variable=prefix chameleon`
mpiexec -np 2 $CHAMELEON/bin/chameleon_stesting -H -o gemm -P 2 -t 2 -m 2000 -n 2000 -k 2000
Generate a Chameleon Singularity image with Guix
To package Chameleon and its dependencies within a singularity image (OpenMPI stack)
singularity_chameleon=`guix pack -f squashfs chameleon --with-branch=chameleon=master --with-input=openblas=mkl mkl starpu hwloc openmpi openssh slurm hdf5 zlib bash coreutils inetutils util-linux procps git grep tar sed gzip which gawk perl emacs-minimal vim gcc-toolchain make cmake pkg-config -S /bin=bin --entry-point=/bin/bash`
cp $singularity_chameleon chameleon-pack.gz.squashfs
# copy the singularity image on the supercomputer, e.g. 'supercomputer'
scp chameleon-pack.gz.squashfs supercomputer:
On a machine where Singularity is installed Chameleon can then be called as follows
# at least openmpi and singularity are required here, e.g. module add openmpi singularity
mpiexec -np 2 singularity exec chameleon-pack.gz.squashfs /bin/chameleon_stesting -H -o gemm -P 2 -t 2 -m 2000 -n 2000 -k 2000
Distribution of Chameleon using Spack
Installing Spack
To get support to install a full distribution on Linux or MacOS X, Chameleon plus dependencies, we encourage users to use Spack. Please refer to our Spack Repository.
git clone https://github.com/llnl/spack.git
export SPACK_ROOT=$PWD/spack
cd spack/
git checkout v0.16.0
. $SPACK_ROOT/share/spack/setup-env.sh
git clone https://gitlab.inria.fr/solverstack/spack-repo.git ./var/spack/repos/solverstack
spack repo add ./var/spack/repos/solverstack
Chameleon is then available
spack info chameleon
spack spec chameleon
Refer to te official documentation of Spack to learn the basic commands.
Installing Chameleon with Spack
Standard Chameleon, last state on the ‘master’ branch
spack install -v chameleon
# chameleon is installed here:
spack location -i chameleon
Notice that there exist several build variants
- chameleon (default) : with starpu - with mpi
- tune the build type (CMake) with build_type=RelWithDebInfo|Debug|Release
- enable/disable shared libraries with +/- shared
- enable/disable mpi with +/- mpi
- enable/disable cuda with +/- cuda
- enable/disable fxt with +/- fxt
- enable/disable simgrid with +/- simgrid
- +openmp~starpu : with openmp - without starpu
- +quark~starpu : with quark - without starpu
Change the version
spack install -v chameleon@1.0.0
Notice also that default mpi is OpenMPI and default blas/lapack is Openblas. This can be changed by adding some constraints on virtual packages.
Change some dependencies
# see lapack providers
spack providers lapack
# see mpi providers
spack providers mpi
# install chameleon with intel mkl to replace openblas
spack install -v chameleon ^intel-mkl
# install chameleon with nmad to replace openmpi
spack install -v chameleon ^nmad
Distribution Brew for Mac OS X
We provide some brew packages here https://gitlab.inria.fr/solverstack/brew-repo (under construction).
Build and install Chameleon with CMake
Compilation of Chameleon libraries and executables are done with CMake (http://www.cmake.org/). This version has been tested with CMake 3.10.2 but any version superior to 2.8 should be fine.
Here the steps to configure, build, test and install
- configure:
cmake path/to/chameleon -DOPTION1= -DOPTION2= ... # see the "Configuration options" section to get list of options # see the "Dependencies detection" for details about libraries detection
- build:
make # do not hesitate to use -j[ncores] option to speedup the compilation
- test (optional, required CHAMELEON_ENABLE_TESTING=ON):
make test # or ctest
- install (optional):
make install
Do not forget to specify the install directory with -DCMAKE_INSTALL_PREFIX at configure.
cmake /home/jdoe/chameleon -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/chameleon
Note that the install process is optional. You are free to use Chameleon binaries compiled in the build directory.