installing.org



Chameleon is written in C and depends on a couple of external
  libraries that must be installed on the system.
Chameleon can be built and installed on UNIX systems (Linux) by the standard
  means of CMake.  General information about CMake, as well as
  installation binaries and CMake source code are available from here.
To get support to install a full distribution Chameleon + dependencies
  we encourage users to use Spack.
Getting Chameleon
The latest official release tarballs of Chameleon sources are
  available for download from the gitlab tags page.
The latest development state is available on gitlab. You need Git
git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git

Prerequisites for installing Chameleon
To install Chameleon’s libraries, header files, and executables, one
  needs:

  CMake (version 2.8 minimum): the build system
  C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM
    can be used
  python: to generate files in the different precisions
  external libraries: this depends on the configuration, by default
    the required libraries are
    
      runtimes: StarPU or PaRSEC or QUARK or OpenMP

      kernels : CBLAS, LAPACKE (with TMG). These are C interfaces to
        Fortran kernels BLAS and LAPACK. There exist several providers
        that can be used with Chameleon (Intel MKL, Netlib, OpenBLAS,
        BLIS/FLAME)
    
  
Optional libraries:

  cuda: cuda, cublas (comes with cuda)
  mpi: openmpi, mpich, intelmpi


These packages must be installed on the system before trying to
  configure/build chameleon.  Please look at the distrib/ directory
  which gives some hints for the installation of dependencies for
  Unix systems.
We give here some examples for a Debian system:
# Update Debian packages list
sudo apt-get update
# Install OpenBLAS
sudo apt-get install -y libopenblas-dev liblapacke-dev
# Install OpenMPI
sudo apt-get install -y libopenmpi-dev
# Install StarPU
sudo apt-get install libstarpu-dev

# Optionnaly to make some specific developments, the following may be installed
# Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI)
sudo apt-get install -y libhwloc-dev
# install EZTrace, usefull to export some nice execution traces
with all runtimes
sudo apt-get install -y libeztrace-dev
# install FxT, usefull to export some nice execution traces with StarPU
sudo apt-get install -y libfxt-dev
# Install cuda and cuBLAS: only if you have a GPU cuda compatible
sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev

# If you prefer a specific version of StarPU, install it yourself, e.g.
# Install StarPU (with MPI and FxT enabled)
mkdir -p $HOME/install
cd $HOME/install
wget https://files.inria.fr/starpu/starpu-1.3.7/starpu-1.3.7.tar.gz
tar xvzf starpu-1.3.7.tar.gz
cd starpu-1.3.7/
./configure --prefix=/usr/local --with-fxt=/usr/lib/x86_64-linux-gnu/
make -j5
sudo make install

# Install PaRSEC: to be used in place of StarPU
mkdir -p $HOME/install
cd $HOME/install
git clone https://bitbucket.org/mfaverge/parsec.git
cd parsec
git checkout mymaster
git submodule update
mkdir -p build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON
make -j5
sudo make install

# Install QUARK: to be used in place of StarPU
mkdir -p $HOME/install
cd $HOME/install
git clone https://github.com/ecrc/quark
cd quark/
sed -i -e "s#prefix=.*#prefix=/usr/local#g" make.inc
sed -i -e "s#CFLAGS=.*#CFLAGS= -O2 -DADD_ -fPIC#g" make.inc
make
sudo make install


Known issues

  we need the lapacke interface to tmg routines and symbol like
    LAPACKE_dlatms_work should be defined in the lapacke
    library. The Debian packages libopenblas-dev and liblapacke-dev
    (version 1.0.0) do not provide the tmg interface. Please update
    your distribution or install the lapacke interface library in
    another way, by yourself from source or with Spack, or with
    Guix-HPC,…
  sometimes parallel make with -j can fails due to undefined
    dependencies between some targets. Try to invoke the make
    command several times if so.

Some details about dependencies
BLAS implementation
BLAS (Basic Linear Algebra Subprograms), are a de facto standard
  for basic linear algebra operations such as vector and matrix
  multiplication.  FORTRAN implementation of BLAS is available from
  Netlib.  Also, C implementation of BLAS is included in GSL (GNU
  Scientific Library).  Both these implementations are reference
  implementation of BLAS, are not optimized for modern processor
  architectures and provide an order of magnitude lower performance
  than optimized implementations.  Highly optimized implementations
  of BLAS are available from many hardware vendors, such as Intel
  MKL, IBM ESSL and AMD ACML.  Fast implementations are also
  available as academic packages, such as ATLAS and OpenBLAS.  The
  standard interface to BLAS is the FORTRAN interface.
Caution about the compatibility: Chameleon has been mainly tested
  with the reference BLAS from NETLIB, OpenBLAS and Intel MKL.
CBLAS
CBLAS is a C language interface to BLAS.  Most commercial and
  academic implementations of BLAS also provide CBLAS.  Netlib
  provides a reference implementation of CBLAS on top of FORTRAN
  BLAS (Netlib CBLAS).  Since GSL is implemented in C, it naturally
  provides CBLAS.
Caution about the compatibility: Chameleon has been mainly tested with
  the reference CBLAS from NETLIB, OpenBLAS and Intel MKL.
LAPACK implementation
LAPACK (Linear Algebra PACKage) is a software library for
  numerical linear algebra, a successor of LINPACK and EISPACK and
  a predecessor of Chameleon.  LAPACK provides routines for solving
  linear systems of equations, linear least square problems,
  eigenvalue problems and singular value problems.  Most commercial
  and academic BLAS packages also provide some LAPACK routines.
Caution about the compatibility: Chameleon has been mainly tested
  with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL.
LAPACKE
LAPACKE is a C language interface to LAPACK (or CLAPACK).  It is
  produced by Intel in coordination with the LAPACK team and is
  available in source code from Netlib in its original version
  (Netlib LAPACKE) and from Chameleon website in an extended
  version (LAPACKE for Chameleon).  In addition to implementing the
  C interface, LAPACKE also provides routines which automatically
  handle workspace allocation, making the use of LAPACK much more
  convenient.
Caution about the compatibility: Chameleon has been mainly tested
  with the reference LAPACKE from NETLIB, OpenBLAS and Intel
  MKL. In addition the LAPACKE library must be configured to
  provide the interface with the TMG routines and symbols like
  LAPACKE_dlatms_work should be defined.
libtmg
libtmg is a component of the LAPACK library, containing routines
  for generation of input matrices for testing and timing of
  LAPACK.  The testing and timing suites of LAPACK require libtmg,
  but not the library itself. Note that the LAPACK library can be
  built and used without libtmg.
Caution about the compatibility: Chameleon has been mainly tested
  with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL.
StarPU
StarPU is a task programming library for hybrid architectures.
  StarPU handles run-time concerns such as:

  Task dependencies
  Optimized heterogeneous scheduling
  Optimized data transfers and replication between main memory
    and discrete memories
  Optimized cluster communications

StarPU can be used to benefit from GPUs and distributed-memory
  environment. Note StarPU is enabled by default.
Caution about the compatibility: Chameleon has been mainly tested
  with StarPU-1.1 and 1.2 releases.
PaRSEC
PaRSEC is a generic framework for architecture aware scheduling
  and management of micro-tasks on distributed many-core
  heterogeneous architectures. It can be used with MPI and Cuda.
Caution about the compatibility: Chameleon is compatible with
  this version
  https://bitbucket.org/mfaverge/parsec/branch/mymaster.
QUARK
QUARK (QUeuing And Runtime for Kernels) provides a library that
  enables the dynamic execution of tasks with data dependencies in
  a multi-core, multi-socket, shared-memory environment. When
  Chameleon is linked with QUARK, it is not possible to exploit
  neither CUDA (for GPUs) nor MPI (distributed-memory environment).
  You can use PaRSEC or StarPU to do so.
Caution about the compatibility: Chameleon has been mainly tested
  with the QUARK library coming from https://github.com/ecrc/quark.
EZTrace
This library provides efficient modules for recording
  traces. Chameleon can trace kernels execution on CPU workers
  thanks to EZTrace and produce .paje files. EZTrace also provides
  integrated modules to trace MPI calls and/or memory usage. See
  how to use this feature here Execution trace
  using EZTrace. To trace kernels execution on all kind of
  workers, such as CUDA, We recommend to use the internal tracing
  support of the runtime system used done by the underlying
  runtime.  See how to use this feature here Execution trace
  using StarPU/FxT.
hwloc
hwloc (Portable Hardware Locality) is a software package for
  accessing the topology of a multicore system including components
  like: cores, sockets, caches and NUMA nodes. The topology
  discovery library, hwloc, is strongly recommended to be used
  through the runtime system. It allows to increase performance,
  and to perform some topology aware scheduling. hwloc is available
  in major distributions and for most OSes and can be downloaded
  from http://www.open-mpi.org/software/hwloc.
Caution about the compatibility: hwlov should be compatible with
  the runtime system used.
OpenMPI
OpenMPI is an open source Message Passing Interface
  implementation for execution on multiple nodes with
  distributed-memory environment.  MPI can be enabled only if the
  runtime system chosen is StarPU (default).  To use MPI through
  StarPU, it is necessary to compile StarPU with MPI enabled.
Caution about the compatibility: OpenMPI should be built with the
  –enable-mpi-thread-multiple option.
Nvidia CUDA Toolkit
Nvidia CUDA Toolkit provides a comprehensive development
  environment for C and C++ developers building GPU-accelerated
  applications.  Chameleon can use a set of low level optimized
  kernels coming from cuBLAS to accelerate computations on GPUs.
  The cuBLAS library is an implementation of BLAS (Basic Linear
  Algebra Subprograms) on top of the Nvidia CUDA runtime.  cuBLAS
  is normaly distributed with Nvidia CUDA Toolkit.  CUDA/cuBLAS can
  be enabled in Chameleon only if the runtime system chosen is
  StarPU (default).  To use CUDA through StarPU, it is necessary to
  compile StarPU with CUDA enabled.
Caution about the compatibility: Chameleon has been mainly tested
  with CUDA releases from versions 4 to 7.5.  Your compiler must be
  compatible with CUDA.
Distribution of Chameleon using GNU Guix
<sec:guix>
We provide Guix packages to install Chameleon with its dependencies
  in a reproducible way on GNU/Linux systems. For MacOSX please refer
  to the next section about Spack packaging.
If you are “root” on the system you can install Guix and directly
  use it to install the libraries. On supercomputers your are not
  root on you may still be able to use it if Docker or Singularity
  are available on the machine because Chameleon can be packaged as
  Docker/Singularity images with Guix.
Installing Guix
Guix requires a running GNU/Linux system, GNU tar and Xz.
gpg --keyserver pgp.mit.edu --recv-keys 3CE464558A84FDC69DB40CFB090B11993D9AEBB5
wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.sh

The Chameleon packages are not official Guix packages. It is then
  necessary to add a channel to get additional packages.  Create a
  ~/.config/guix/channels.scm file with the following snippet:
(cons (channel
    (name 'guix-hpc-non-free)
    (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git"))
  %default-channels)

Update guix package definition
guix pull

Update new guix in the path
PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH"
hash guix

For further shell sessions, add this to the ~/.bash_profile file
export PATH="$HOME/.config/guix/current/bin${PATH:+:}$PATH"
export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"

Chameleon packages are now available
guix search ^chameleon

Refer to the official documentation of Guix to learn the basic
  commands.
Installing Chameleon with Guix
Standard Chameleon, last release
guix install chameleon

Notice that there exist several build variants

  chameleon (default) : with starpu - with mpi
  chameleon-mkl-mt : default version but with Intel MKL multithreaded to replace OpenBlas
  chameleon-cuda : with starpu - with mpi - with cuda
  chameleon-simgrid : with starpu - with mpi - with simgrid
  chameleon-openmp : with openmp - without mpi
  chameleon-parsec : with parsec - without mpi
  chameleon-quark : with quark - without mpi

Change the version
guix install chameleon --with-branch=chameleon=master
guix install chameleon --with-commit=chameleon=b31d7575fb7d9c0e1ba2d8ec633e16cb83778e8b
guix install chameleon --with-git-url=chameleon=https://gitlab.inria.fr/fpruvost/chameleon.git
guix install chameleon --with-git-url=chameleon=$HOME/git/chameleon