Newer
Older
# This file is part of the Chameleon User's Guide.
# Copyright (C) 2017 Inria
# See the file ../users_guide.org for copying conditions.

PRUVOST Florent
committed
Chameleon is written in C and depends on a couple of external
libraries that must be installed on the system.
# , it provides an interface to be called from Fortran
Chameleon can be built and installed by the standard means of [[http://www.cmake.org/][CMake]].
General information about CMake, as well as installation binaries and
CMake source code are available from [[http://www.cmake.org/cmake/resources/software.html][here]].
To get support to install a full distribution Chameleon + dependencies
we encourage users to use the /morse/ branch of [[sec:spack][Spack]].
** Getting Chameleon
The latest official release tarballs of Chameleon sources are
available for download from the [[https://gitlab.inria.fr/solverstack/chameleon/tags][gitlab tags page]].
The latest development state is available on [[https://gitlab.inria.fr/solverstack/chameleon][gitlab]]. You need [[https://git-scm.com/downloads][Git]]
#+begin_src
git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git
#+end_src
** Prerequisites for installing Chameleon
To install Chameleon's libraries, header files, and executables, one
needs:
- CMake (version 2.8 minimum): the build system
- C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM
can be used
- python: to generate files in the different precisions
- external libraries: this depends on the configuration, by default
the required libraries are
- [[http://runtime.bordeaux.inria.fr/StarPU/][StarPU]]
- CBLAS, LAPACKE: these are interfaces and there exist several
providers that can be used with Chameleon
- Intel MKL, Netlib, OpenBlas
- BLAS, LAPACK, TMGLIB: there exist several providers that can be
used with Chameleon
- Eigen, Intel MKL, Netlib, OpenBlas
- pthread (libpthread)
- math (libm)
Optional libraries:
- [[http://icl.cs.utk.edu/quark/][quark]]
- [[https://developer.nvidia.com/cuda-downloads][cuda]]
- [[http://docs.nvidia.com/cuda/cublas/][cublas]]: comes with cuda
- mpi: [[http://www.open-mpi.org/][openmpi]]
These packages must be installed on the system before trying to
configure/build chameleon. Please look at the distrib/ directory
which gives some hints for the installation of dependencies for Unix
systems.
We give here some examples for a Debian system:
#+begin_src
# Update Debian packages list
sudo apt-get update
# Install Netlib blas, lapack, tmglib, cblas and lapacke suite
sudo apt-get install -y liblapack-dev liblapacke-dev
# Alternatively to Netlib, OpenBLAS could be used (faster kernels)
sudo apt-get install -y libopenblas-dev liblapacke-dev
# Install OpenMPI
sudo apt-get install -y libopenmpi-dev
# Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI)
sudo apt-get install -y libhwloc-dev
# install FxT, usefull to export some nice execution traces with StarPU
sudo apt-get install -y libfxt-dev
# Install cuda and cuBLAS: only if you have a GPU cuda compatible
sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev
# Install StarPU (with MPI and FxT enabled)
mkdir -p $HOME/install
cd $HOME/install
wget http://starpu.gforge.inria.fr/files/starpu-1.2.2/starpu-1.2.2.tar.gz
tar xvzf starpu-1.2.2.tar.gz
cd starpu-1.2.2/
./configure --prefix=$HOME/install/starpu --disable-opencl --disable-cuda --with-fxt=/usr/lib/x86_64-linux-gnu/
make
make install
cd $HOME/install
rm starpu-1.2.2/ starpu-1.2.2.tar.gz -rf
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
mkdir -p $HOME/install
cd $HOME/install
wget http://icl.cs.utk.edu/projectsfiles/quark/pubs/quark-0.9.0.tgz
tar xvzf quark-0.9.0.tgz
cd quark-0.9.0/
sed -i -e "s#prefix=\.\/install#prefix=$HOME/install/quark#g" make.inc
sed -i -e "s#CFLAGS=-O2#CFLAGS=-O2 -fPIC#g" make.inc
make
make install
cd $HOME/install
rm quark-0.9.0/ quark-0.9.0.tgz -rf
#+end_src
*** Some details about dependencies
**** BLAS implementation
[[http://www.netlib.org/blas/][BLAS]] (Basic Linear Algebra Subprograms), are a de facto standard
for basic linear algebra operations such as vector and matrix
multiplication. FORTRAN implementation of BLAS is available from
Netlib. Also, C implementation of BLAS is included in GSL (GNU
Scientific Library). Both these implementations are reference
implementation of BLAS, are not optimized for modern processor
architectures and provide an order of magnitude lower performance
than optimized implementations. Highly optimized implementations
of BLAS are available from many hardware vendors, such as Intel
MKL, IBM ESSL and AMD ACML. Fast implementations are also
available as academic packages, such as ATLAS and OpenBLAS. The
standard interface to BLAS is the FORTRAN interface.
*Caution about the compatibility:* Chameleon has been mainly tested
with the reference BLAS from NETLIB, OpenBLAS and Intel MKL.
**** CBLAS
[[http://www.netlib.org/blas/#_cblas][CBLAS]] is a C language interface to BLAS. Most commercial and
academic implementations of BLAS also provide CBLAS. Netlib
provides a reference implementation of CBLAS on top of FORTRAN
BLAS (Netlib CBLAS). Since GSL is implemented in C, it naturally
provides CBLAS.
*Caution about the compatibility:* Chameleon has been mainly tested with
the reference CBLAS from NETLIB, OpenBLAS and Intel MKL.
**** LAPACK implementation
[[http://www.netlib.org/lapack/][LAPACK]] (Linear Algebra PACKage) is a software library for
numerical linear algebra, a successor of LINPACK and EISPACK and
a predecessor of Chameleon. LAPACK provides routines for solving
linear systems of equations, linear least square problems,
eigenvalue problems and singular value problems. Most commercial
and academic BLAS packages also provide some LAPACK routines.
*Caution about the compatibility:* Chameleon has been mainly tested
with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL.
**** LAPACKE
[[http://www.netlib.org/lapack/][LAPACKE]] is a C language interface to LAPACK (or CLAPACK). It is
produced by Intel in coordination with the LAPACK team and is
available in source code from Netlib in its original version
(Netlib LAPACKE) and from Chameleon website in an extended
version (LAPACKE for Chameleon). In addition to implementing the
C interface, LAPACKE also provides routines which automatically
handle workspace allocation, making the use of LAPACK much more
convenient.
*Caution about the compatibility:* Chameleon has been mainly tested
with the reference LAPACKE from NETLIB, OpenBLAS and Intel MKL.
**** libtmg
[[http://www.netlib.org/lapack/][libtmg]] is a component of the LAPACK library, containing routines
for generation of input matrices for testing and timing of
LAPACK. The testing and timing suites of LAPACK require libtmg,
but not the library itself. Note that the LAPACK library can be
built and used without libtmg.
*Caution about the compatibility:* Chameleon has been mainly tested
with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL.
**** QUARK
[[http://icl.cs.utk.edu/quark/][QUARK]] (QUeuing And Runtime for Kernels) provides a library that
enables the dynamic execution of tasks with data dependencies in
a multi-core, multi-socket, shared-memory environment. One of
QUARK or StarPU Runtime systems has to be enabled in order to
schedule tasks on the architecture. If QUARK is enabled then
StarPU is disabled and conversely. Note StarPU is enabled by
default. When Chameleon is linked with QUARK, it is not possible
to exploit neither CUDA (for GPUs) nor MPI (distributed-memory
environment). You can use StarPU to do so.
*Caution about the compatibility:* Chameleon has been mainly tested
with the QUARK library 0.9.
**** StarPU
[[http://runtime.bordeaux.inria.fr/StarPU/][StarPU]] is a task programming library for hybrid architectures.
StarPU handles run-time concerns such as:
* Task dependencies
* Optimized heterogeneous scheduling
* Optimized data transfers and replication between main memory
and discrete memories
* Optimized cluster communications
StarPU can be used to benefit from GPUs and distributed-memory
environment. One of QUARK or StarPU runtime system has to be
enabled in order to schedule tasks on the architecture. If
StarPU is enabled then QUARK is disabled and conversely. Note
StarPU is enabled by default.
*Caution about the compatibility:* Chameleon has been mainly tested
with StarPU-1.1 and 1.2 releases.
**** FxT
[[http://download.savannah.gnu.org/releases/fkt/][FxT]] stands for both FKT (Fast Kernel Tracing) and FUT (Fast User
Tracing). This library provides efficient support for recording
traces. Chameleon can trace kernels execution on the different
workers and produce .paje files if FxT is enabled. FxT can only
be used through StarPU and StarPU must be compiled with FxT
enabled, see how to use this feature here [[sec:trace][Execution trace using
StarPU]].
*Caution about the compatibility:* FxT should be compatible with
the version of StarPU used.
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
**** hwloc
[[http://www.open-mpi.org/projects/hwloc/][hwloc]] (Portable Hardware Locality) is a software package for
accessing the topology of a multicore system including components
like: cores, sockets, caches and NUMA nodes. The topology
discovery library, ~hwloc~, is not mandatory to use StarPU but
strongly recommended. It allows to increase performance, and to
perform some topology aware scheduling. ~hwloc~ is available in
major distributions and for most OSes and can be downloaded from
http://www.open-mpi.org/software/hwloc.
**** pthread
POSIX threads library is required to run Chameleon on Unix-like systems.
It is a standard component of any such system.
**** OpenMPI
[[http://www.open-mpi.org/][OpenMPI]] is an open source Message Passing Interface
implementation for execution on multiple nodes with
distributed-memory environment. MPI can be enabled only if the
runtime system chosen is StarPU (default). To use MPI through
StarPU, it is necessary to compile StarPU with MPI enabled.
*Caution about the compatibility:* OpenMPI should be built with the
--enable-mpi-thread-multiple option.
**** Nvidia CUDA Toolkit
[[https://developer.nvidia.com/cuda-toolkit][Nvidia CUDA Toolkit]] provides a comprehensive development
environment for C and C++ developers building GPU-accelerated
applications. Chameleon can use a set of low level optimized
kernels coming from cuBLAS to accelerate computations on GPUs.
The [[http://docs.nvidia.com/cuda/cublas/][cuBLAS]] library is an implementation of BLAS (Basic Linear
Algebra Subprograms) on top of the Nvidia CUDA runtime. cuBLAS
is normaly distributed with Nvidia CUDA Toolkit. CUDA/cuBLAS can
be enabled in Chameleon only if the runtime system chosen is
StarPU (default). To use CUDA through StarPU, it is necessary to
compile StarPU with CUDA enabled.
*Caution about the compatibility:* Chameleon has been mainly tested
with CUDA releases from versions 4 to 7.5. Your compiler must be
compatible with CUDA.
** Distribution of Chameleon using Spack
<<sec:spack>>
To get support to install a full distribution (Chameleon +
dependencies) we encourage users to use the morse branch of *Spack*.
Please read these documentations:
* [[http://morse.gforge.inria.fr/spack/spack.html][Spack Cham]]
* [[http://morse.gforge.inria.fr/spack/spack.html#orgd5b1afe][Section Chameleon]]
*** Usage example for a simple distribution of Chameleon
#+begin_src sh
git clone https://github.com/solverstack/spack.git
. ./spack/share/spack/setup-env.sh
spack install -v chameleon
# chameleon is installed here:
`spack location -i chameleon`
#+end_src
** Build and install Chameleon with CMake
Compilation of Chameleon libraries and executables are done with
CMake (http://www.cmake.org/). This version has been tested with
CMake 3.5.1 but any version superior to 2.8 should be fine.
Here the steps to configure, build, test and install
#+begin_src
cmake path/to/chameleon -DOPTION1= -DOPTION2= ...
# see the "Configuration options" section to get list of options
# see the "Dependencies detection" for details about libraries detection
#+end_src
#+begin_src
make
# do not hesitate to use -j[ncores] option to speedup the compilation
#+end_src
3. test (optional, required CHAMELEON_ENABLE_TESTING=ON and/or
#+begin_src
make test
# or
ctest
#+end_src
#+begin_src
make install
#+end_src
Do not forget to specify the install directory with
*-DCMAKE_INSTALL_PREFIX* at configure.
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
#+begin_example
cmake /home/jdoe/chameleon -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/chameleon
#+end_example
Note that the install process is optional. You are free to use
Chameleon binaries compiled in the build directory.
*** Configuration options
You can optionally activate some options at cmake configure (like CUDA, MPI, ...)
invoking ~cmake path/to/your/CMakeLists.txt -DOPTION1= -DOPTION2= ...~
#+begin_src
cmake /home/jdoe/chameleon/ -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=/home/jdoe/install/ \
-DCHAMELEON_USE_CUDA=ON \
-DCHAMELEON_USE_MPI=ON \
-DBLA_VENDOR=Intel10_64lp \
-DSTARPU_DIR=/home/jdoe/install/starpu-1.2/ \
-DCHAMELEON_ENABLE_TRACING=ON
#+end_src
You can get the full list of options with *-L[A][H]* options of cmake command
#+begin_src
cmake -LH /home/jdoe/chameleon/
#+end_src
You can also set the options thanks to the *ccmake* interface.
**** Native CMake options (non-exhaustive list)
* *CMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo|MinSizeRel*:
level of compiler optimization, enable/disable debug
information
* *CMAKE_INSTALL_PREFIX=path/to/your/install/dir*: where headers,
libraries, executables, etc, will be copied when invoking make
install
* *BUILD_SHARED_LIBS=ON|OFF*: indicate wether or not CMake has to
build Chameleon static (~OFF~) or shared (~ON~) libraries.
* *CMAKE_C_COMPILER=gcc|icc|...*: to choose the C compilers
if several exist in the environment
* *CMAKE_Fortran_COMPILER=gfortran|ifort|...*: to choose the
Fortran compilers if several exist in the environment
**** Related to specific modules (find_package) to find external libraries
* *BLA_VENDOR=All|Eigen|Open|Generic|Intel10_64lp|Intel10_64lp_seq*:
to use intel mkl for example, see the list of BLA_VENDOR in
* *STARPU_DIR=path/to/root/starpu/install*, see [[sec:depdet][Dependencies
detection]]
* *STARPU_INCDIR=path/to/root/starpu/install/headers*, see
[[sec:depdet][Dependencies detection]]
* *STARPU_LIBDIR=path/to/root/starpu/install/libs*, see
[[sec:depdet][Dependencies detection]]
* List of packages that can searched just like STARPU (with _DIR,
_INCDIR and _LIBDIR):
* *BLAS*, *CBLAS*, *EZTRACE*, *FXT*, *HWLOC*, *LAPACK*, *LAPACKE*, *QUARK*,
*SIMGRID*, *TMG*
Libraries detected with an official cmake module (see module files
in CMAKE_ROOT/Modules/): CUDA - MPI - Threads.
Libraries detected with our cmake modules (see module files in
cmake_modules/morse_cmake/modules/find/ directory of Chameleon
sources): BLAS - CBLAS - EZTRACE - FXT - HWLOC - LAPACK -
LAPACKE - QUARK - SIMGRID - STARPU - TMG.
**** Chameleon specific options
* *CHAMELEON_SCHED_STARPU=ON|OFF* (default ON): to link with
StarPU library (runtime system)
* *CHAMELEON_SCHED_QUARK=ON|OFF* (default OFF): to link with QUARK
library (runtime system)
* *CHAMELEON_USE_MPI=ON|OFF* (default OFF): to link with MPI
library (message passing implementation for use of multiple
nodes with distributed memory), can only be used with StarPU
* *CHAMELEON_USE_CUDA=ON|OFF* (default OFF): to link with CUDA
runtime (implementation paradigm for accelerated codes on GPUs)
and cuBLAS library (optimized BLAS kernels on GPUs), can only
be used with StarPU
* *CHAMELEON_ENABLE_DOC=ON|OFF* (default OFF): to control build of
the documentation contained in doc/ sub-directory
* *CHAMELEON_ENABLE_EXAMPLE=ON|OFF* (default ON): to control build
of the examples executables (API usage) contained in example/
sub-directory
* *CHAMELEON_ENABLE_PRUNING_STATS=ON|OFF* (default OFF)
* *CHAMELEON_ENABLE_TESTING=ON|OFF* (default ON): to control build
of testing executables (numerical check) contained in testing/
sub-directory
* *CHAMELEON_ENABLE_TIMING=ON|OFF* (default ON): to control build
of timing executables (performances check) contained in timing/
sub-directory
* *CHAMELEON_ENABLE_TRACING=ON|OFF* (default OFF): to enable trace
generation during execution of timing drivers. It requires
StarPU to be linked with FxT library (trace execution of
kernels on workers), see also [[sec:trace][Execution tracing
with StarPU]].
* *CHAMELEON_SIMULATION=ON|OFF* (default OFF): to enable
simulation mode, means Chameleon will not really execute tasks,
see details in section [[sec:simu][Use simulation mode with
StarPU-SimGrid]]. This option must be used with StarPU compiled
with [[http://simgrid.gforge.inria.fr/][SimGrid]] allowing to guess the execution time on any
architecture. This feature should be used to make experiments
on the scheduler behaviors and performances not to produce
solutions of linear systems.
*** Dependencies detection
<<sec:depdet>>
You have different choices to detect dependencies on your system,
either by setting some environment variables containing paths to
the libs and headers or by specifying them directly at cmake
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
1) detection of dependencies through environment variables:
- LD_LIBRARY_PATH should contain the list of paths where to find
the libraries:
#+begin_src
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:install/path/to/your/lib
#+end_src
- INCLUDE should contain the list of paths where to find the
header files of libraries
#+begin_src
export INCLUDE=$INCLUDE:install/path/to/your/headers
#+end_src
2) detection with user's given paths:
- you can specify the path at cmake configure by invoking ~cmake
path/to/your/CMakeLists.txt -DLIB_DIR=path/to/your/lib~ where
LIB stands for the name of the lib to look for
#+begin_src
cmake path/to/your/CMakeLists.txt -DSTARPU_DIR=path/to/starpudir \
-DCBLAS_DIR= ...
#+end_src
it is also possible to specify headers and library directories
separately
#+begin_src
cmake path/to/your/CMakeLists.txt -DSTARPU_INCDIR=path/to/libstarpu/include/starpu/1.1 \
-DSTARPU_LIBDIR=path/to/libstarpu/lib
#+end_src
- note: BLAS and LAPACK detection can be tedious so that we
provide a verbose mode you can set *-DBLAS_VERBOSE=ON* or
*-DLAPACK_VERBOSE=ON* to enable it
3) detection with custom environment variables: all variables like
_DIR, _INCDIR, _LIBDIR can be set as environment variables
instead of CMake options, there will be read
4) using [[https://www.freedesktop.org/wiki/Software/pkg-config/][pkg-config]] for libraries that provide .pc files
- update your *PKG_CONFIG_PATH* to the paths where to find .pc
files of installed external libraries like hwloc, starpu, some
blas/lapack, etc