# This file is part of the Chameleon User's Guide. # Copyright (C) 2017 Inria # See the file ../users_guide.org for copying conditions. Chameleon is written in C and depends on a couple of external libraries that must be installed on the system. # , it provides an interface to be called from Fortran Chameleon can be built and installed by the standard means of [[http://www.cmake.org/][CMake]]. General information about CMake, as well as installation binaries and CMake source code are available from [[http://www.cmake.org/cmake/resources/software.html][here]]. To get support to install a full distribution Chameleon + dependencies we encourage users to use the /morse/ branch of [[sec:spack][Spack]]. ** Getting Chameleon The latest official release tarballs of Chameleon sources are available for download from the [[https://gitlab.inria.fr/solverstack/chameleon/tags][gitlab tags page]]. The latest development state is available on [[https://gitlab.inria.fr/solverstack/chameleon][gitlab]]. You need [[https://git-scm.com/downloads][Git]] #+begin_src git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git #+end_src ** Prerequisites for installing Chameleon To install Chameleon's libraries, header files, and executables, one needs: - CMake (version 2.8 minimum): the build system - C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM can be used - python: to generate files in the different precisions - external libraries: this depends on the configuration, by default the required libraries are - [[http://runtime.bordeaux.inria.fr/StarPU/][StarPU]] - CBLAS, LAPACKE: these are interfaces and there exist several providers that can be used with Chameleon - Intel MKL, Netlib, OpenBlas - BLAS, LAPACK, TMGLIB: there exist several providers that can be used with Chameleon - Eigen, Intel MKL, Netlib, OpenBlas - pthread (libpthread) - math (libm) Optional libraries: - [[http://icl.cs.utk.edu/quark/][quark]] - [[https://developer.nvidia.com/cuda-downloads][cuda]] - [[http://docs.nvidia.com/cuda/cublas/][cublas]]: comes with cuda - mpi: [[http://www.open-mpi.org/][openmpi]] These packages must be installed on the system before trying to configure/build chameleon. Please look at the distrib/ directory which gives some hints for the installation of dependencies for Unix systems. We give here some examples for a Debian system: #+begin_src # Update Debian packages list sudo apt-get update # Install Netlib blas, lapack, tmglib, cblas and lapacke suite sudo apt-get install -y liblapack-dev liblapacke-dev # Alternatively to Netlib, OpenBLAS could be used (faster kernels) sudo apt-get install -y libopenblas-dev liblapacke-dev # Install OpenMPI sudo apt-get install -y libopenmpi-dev # Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI) sudo apt-get install -y libhwloc-dev # install FxT, usefull to export some nice execution traces with StarPU sudo apt-get install -y libfxt-dev # Install cuda and cuBLAS: only if you have a GPU cuda compatible sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev # Install StarPU (with MPI and FxT enabled) mkdir -p $HOME/install cd $HOME/install wget http://starpu.gforge.inria.fr/files/starpu-1.2.2/starpu-1.2.2.tar.gz tar xvzf starpu-1.2.2.tar.gz cd starpu-1.2.2/ ./configure --prefix=$HOME/install/starpu --disable-opencl --disable-cuda --with-fxt=/usr/lib/x86_64-linux-gnu/ make make install cd $HOME/install rm starpu-1.2.2/ starpu-1.2.2.tar.gz -rf # Install QUARK: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install wget http://icl.cs.utk.edu/projectsfiles/quark/pubs/quark-0.9.0.tgz tar xvzf quark-0.9.0.tgz cd quark-0.9.0/ sed -i -e "s#prefix=\.\/install#prefix=$HOME/install/quark#g" make.inc sed -i -e "s#CFLAGS=-O2#CFLAGS=-O2 -fPIC#g" make.inc make make install cd $HOME/install rm quark-0.9.0/ quark-0.9.0.tgz -rf #+end_src *** Some details about dependencies **** BLAS implementation [[http://www.netlib.org/blas/][BLAS]] (Basic Linear Algebra Subprograms), are a de facto standard for basic linear algebra operations such as vector and matrix multiplication. FORTRAN implementation of BLAS is available from Netlib. Also, C implementation of BLAS is included in GSL (GNU Scientific Library). Both these implementations are reference implementation of BLAS, are not optimized for modern processor architectures and provide an order of magnitude lower performance than optimized implementations. Highly optimized implementations of BLAS are available from many hardware vendors, such as Intel MKL, IBM ESSL and AMD ACML. Fast implementations are also available as academic packages, such as ATLAS and OpenBLAS. The standard interface to BLAS is the FORTRAN interface. *Caution about the compatibility:* Chameleon has been mainly tested with the reference BLAS from NETLIB, OpenBLAS and Intel MKL. **** CBLAS [[http://www.netlib.org/blas/#_cblas][CBLAS]] is a C language interface to BLAS. Most commercial and academic implementations of BLAS also provide CBLAS. Netlib provides a reference implementation of CBLAS on top of FORTRAN BLAS (Netlib CBLAS). Since GSL is implemented in C, it naturally provides CBLAS. *Caution about the compatibility:* Chameleon has been mainly tested with the reference CBLAS from NETLIB, OpenBLAS and Intel MKL. **** LAPACK implementation [[http://www.netlib.org/lapack/][LAPACK]] (Linear Algebra PACKage) is a software library for numerical linear algebra, a successor of LINPACK and EISPACK and a predecessor of Chameleon. LAPACK provides routines for solving linear systems of equations, linear least square problems, eigenvalue problems and singular value problems. Most commercial and academic BLAS packages also provide some LAPACK routines. *Caution about the compatibility:* Chameleon has been mainly tested with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL. **** LAPACKE [[http://www.netlib.org/lapack/][LAPACKE]] is a C language interface to LAPACK (or CLAPACK). It is produced by Intel in coordination with the LAPACK team and is available in source code from Netlib in its original version (Netlib LAPACKE) and from Chameleon website in an extended version (LAPACKE for Chameleon). In addition to implementing the C interface, LAPACKE also provides routines which automatically handle workspace allocation, making the use of LAPACK much more convenient. *Caution about the compatibility:* Chameleon has been mainly tested with the reference LAPACKE from NETLIB, OpenBLAS and Intel MKL. **** libtmg [[http://www.netlib.org/lapack/][libtmg]] is a component of the LAPACK library, containing routines for generation of input matrices for testing and timing of LAPACK. The testing and timing suites of LAPACK require libtmg, but not the library itself. Note that the LAPACK library can be built and used without libtmg. *Caution about the compatibility:* Chameleon has been mainly tested with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL. **** QUARK [[http://icl.cs.utk.edu/quark/][QUARK]] (QUeuing And Runtime for Kernels) provides a library that enables the dynamic execution of tasks with data dependencies in a multi-core, multi-socket, shared-memory environment. One of QUARK or StarPU Runtime systems has to be enabled in order to schedule tasks on the architecture. If QUARK is enabled then StarPU is disabled and conversely. Note StarPU is enabled by default. When Chameleon is linked with QUARK, it is not possible to exploit neither CUDA (for GPUs) nor MPI (distributed-memory environment). You can use StarPU to do so. *Caution about the compatibility:* Chameleon has been mainly tested with the QUARK library 0.9. **** StarPU [[http://runtime.bordeaux.inria.fr/StarPU/][StarPU]] is a task programming library for hybrid architectures. StarPU handles run-time concerns such as: * Task dependencies * Optimized heterogeneous scheduling * Optimized data transfers and replication between main memory and discrete memories * Optimized cluster communications StarPU can be used to benefit from GPUs and distributed-memory environment. One of QUARK or StarPU runtime system has to be enabled in order to schedule tasks on the architecture. If StarPU is enabled then QUARK is disabled and conversely. Note StarPU is enabled by default. *Caution about the compatibility:* Chameleon has been mainly tested with StarPU-1.1 and 1.2 releases. **** FxT [[http://download.savannah.gnu.org/releases/fkt/][FxT]] stands for both FKT (Fast Kernel Tracing) and FUT (Fast User Tracing). This library provides efficient support for recording traces. Chameleon can trace kernels execution on the different workers and produce .paje files if FxT is enabled. FxT can only be used through StarPU and StarPU must be compiled with FxT enabled, see how to use this feature here [[sec:trace][Execution trace using StarPU]]. *Caution about the compatibility:* FxT should be compatible with the version of StarPU used. **** hwloc [[http://www.open-mpi.org/projects/hwloc/][hwloc]] (Portable Hardware Locality) is a software package for accessing the topology of a multicore system including components like: cores, sockets, caches and NUMA nodes. The topology discovery library, ~hwloc~, is not mandatory to use StarPU but strongly recommended. It allows to increase performance, and to perform some topology aware scheduling. ~hwloc~ is available in major distributions and for most OSes and can be downloaded from http://www.open-mpi.org/software/hwloc. **** pthread POSIX threads library is required to run Chameleon on Unix-like systems. It is a standard component of any such system. **** OpenMPI [[http://www.open-mpi.org/][OpenMPI]] is an open source Message Passing Interface implementation for execution on multiple nodes with distributed-memory environment. MPI can be enabled only if the runtime system chosen is StarPU (default). To use MPI through StarPU, it is necessary to compile StarPU with MPI enabled. *Caution about the compatibility:* OpenMPI should be built with the --enable-mpi-thread-multiple option. **** Nvidia CUDA Toolkit [[https://developer.nvidia.com/cuda-toolkit][Nvidia CUDA Toolkit]] provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. Chameleon can use a set of low level optimized kernels coming from cuBLAS to accelerate computations on GPUs. The [[http://docs.nvidia.com/cuda/cublas/][cuBLAS]] library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the Nvidia CUDA runtime. cuBLAS is normaly distributed with Nvidia CUDA Toolkit. CUDA/cuBLAS can be enabled in Chameleon only if the runtime system chosen is StarPU (default). To use CUDA through StarPU, it is necessary to compile StarPU with CUDA enabled. *Caution about the compatibility:* Chameleon has been mainly tested with CUDA releases from versions 4 to 7.5. Your compiler must be compatible with CUDA. ** Distribution of Chameleon using Spack <<sec:spack>> To get support to install a full distribution (Chameleon + dependencies) we encourage users to use the morse branch of *Spack*. Please read these documentations: * [[http://morse.gforge.inria.fr/spack/spack.html][Spack Morse]] * [[http://morse.gforge.inria.fr/spack/spack.html#orgd5b1afe][Section Chameleon]] *** Usage example for a simple distribution of Chameleon #+begin_src sh git clone https://github.com/solverstack/spack.git . ./spack/share/spack/setup-env.sh spack install -v chameleon # chameleon is installed here: `spack location -i chameleon` #+end_src ** Build and install Chameleon with CMake Compilation of Chameleon libraries and executables are done with CMake (http://www.cmake.org/). This version has been tested with CMake 3.5.1 but any version superior to 2.8 should be fine. Here the steps to configure, build, test and install 1. configure: #+begin_src cmake path/to/chameleon -DOPTION1= -DOPTION2= ... # see the "Configuration options" section to get list of options # see the "Dependencies detection" for details about libraries detection #+end_src 2. build: #+begin_src make # do not hesitate to use -j[ncores] option to speedup the compilation #+end_src 3. test (optional, required CHAMELEON_ENABLE_TESTING=ON and/or CHAMELEON_ENABLE_TIMING=ON): #+begin_src make test # or ctest #+end_src 4. install (optional): #+begin_src make install #+end_src Do not forget to specify the install directory with *-DCMAKE_INSTALL_PREFIX* at configure. #+begin_example cmake /home/jdoe/chameleon -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/chameleon #+end_example Note that the install process is optional. You are free to use Chameleon binaries compiled in the build directory. *** Configuration options You can optionally activate some options at cmake configure (like CUDA, MPI, ...) invoking ~cmake path/to/your/CMakeLists.txt -DOPTION1= -DOPTION2= ...~ #+begin_src cmake /home/jdoe/chameleon/ -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/ \ -DCHAMELEON_USE_CUDA=ON \ -DCHAMELEON_USE_MPI=ON \ -DBLA_VENDOR=Intel10_64lp \ -DSTARPU_DIR=/home/jdoe/install/starpu-1.2/ \ -DCHAMELEON_ENABLE_TRACING=ON #+end_src You can get the full list of options with *-L[A][H]* options of cmake command #+begin_src cmake -LH /home/jdoe/chameleon/ #+end_src You can also set the options thanks to the *ccmake* interface. **** Native CMake options (non-exhaustive list) * *CMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo|MinSizeRel*: level of compiler optimization, enable/disable debug information * *CMAKE_INSTALL_PREFIX=path/to/your/install/dir*: where headers, libraries, executables, etc, will be copied when invoking make install * *BUILD_SHARED_LIBS=ON|OFF*: indicate wether or not CMake has to build Chameleon static (~OFF~) or shared (~ON~) libraries. * *CMAKE_C_COMPILER=gcc|icc|...*: to choose the C compilers if several exist in the environment * *CMAKE_Fortran_COMPILER=gfortran|ifort|...*: to choose the Fortran compilers if several exist in the environment **** Related to specific modules (find_package) to find external libraries * *BLA_VENDOR=All|Eigen|Open|Generic|Intel10_64lp|Intel10_64lp_seq*: to use intel mkl for example, see the list of BLA_VENDOR in FindBLAS.cmake in cmake_modules/morse_cmake/modules/find * *STARPU_DIR=path/to/root/starpu/install*, see [[sec:depdet][Dependencies detection]] * *STARPU_INCDIR=path/to/root/starpu/install/headers*, see [[sec:depdet][Dependencies detection]] * *STARPU_LIBDIR=path/to/root/starpu/install/libs*, see [[sec:depdet][Dependencies detection]] * List of packages that can searched just like STARPU (with _DIR, _INCDIR and _LIBDIR): * *BLAS*, *CBLAS*, *EZTRACE*, *FXT*, *HWLOC*, *LAPACK*, *LAPACKE*, *QUARK*, *SIMGRID*, *TMG* Libraries detected with an official cmake module (see module files in CMAKE_ROOT/Modules/): CUDA - MPI - Threads. Libraries detected with our cmake modules (see module files in cmake_modules/morse_cmake/modules/find/ directory of Chameleon sources): BLAS - CBLAS - EZTRACE - FXT - HWLOC - LAPACK - LAPACKE - QUARK - SIMGRID - STARPU - TMG. **** Chameleon specific options * *CHAMELEON_SCHED_STARPU=ON|OFF* (default ON): to link with StarPU library (runtime system) * *CHAMELEON_SCHED_QUARK=ON|OFF* (default OFF): to link with QUARK library (runtime system) * *CHAMELEON_USE_MPI=ON|OFF* (default OFF): to link with MPI library (message passing implementation for use of multiple nodes with distributed memory), can only be used with StarPU * *CHAMELEON_USE_CUDA=ON|OFF* (default OFF): to link with CUDA runtime (implementation paradigm for accelerated codes on GPUs) and cuBLAS library (optimized BLAS kernels on GPUs), can only be used with StarPU * *CHAMELEON_ENABLE_DOC=ON|OFF* (default OFF): to control build of the documentation contained in doc/ sub-directory * *CHAMELEON_ENABLE_EXAMPLE=ON|OFF* (default ON): to control build of the examples executables (API usage) contained in example/ sub-directory * *CHAMELEON_ENABLE_PRUNING_STATS=ON|OFF* (default OFF) * *CHAMELEON_ENABLE_TESTING=ON|OFF* (default ON): to control build of testing executables (numerical check) contained in testing/ sub-directory * *CHAMELEON_ENABLE_TIMING=ON|OFF* (default ON): to control build of timing executables (performances check) contained in timing/ sub-directory * *CHAMELEON_ENABLE_TRACING=ON|OFF* (default OFF): to enable trace generation during execution of timing drivers. It requires StarPU to be linked with FxT library (trace execution of kernels on workers), see also [[sec:trace][Execution tracing with StarPU]]. * *CHAMELEON_SIMULATION=ON|OFF* (default OFF): to enable simulation mode, means Chameleon will not really execute tasks, see details in section [[sec:simu][Use simulation mode with StarPU-SimGrid]]. This option must be used with StarPU compiled with [[http://simgrid.gforge.inria.fr/][SimGrid]] allowing to guess the execution time on any architecture. This feature should be used to make experiments on the scheduler behaviors and performances not to produce solutions of linear systems. *** Dependencies detection <<sec:depdet>> You have different choices to detect dependencies on your system, either by setting some environment variables containing paths to the libs and headers or by specifying them directly at cmake configure. Different cases: 1) detection of dependencies through environment variables: - LD_LIBRARY_PATH should contain the list of paths where to find the libraries: #+begin_src export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:install/path/to/your/lib #+end_src - INCLUDE should contain the list of paths where to find the header files of libraries #+begin_src export INCLUDE=$INCLUDE:install/path/to/your/headers #+end_src 2) detection with user's given paths: - you can specify the path at cmake configure by invoking ~cmake path/to/your/CMakeLists.txt -DLIB_DIR=path/to/your/lib~ where LIB stands for the name of the lib to look for #+begin_src cmake path/to/your/CMakeLists.txt -DSTARPU_DIR=path/to/starpudir \ -DCBLAS_DIR= ... #+end_src it is also possible to specify headers and library directories separately #+begin_src cmake path/to/your/CMakeLists.txt -DSTARPU_INCDIR=path/to/libstarpu/include/starpu/1.1 \ -DSTARPU_LIBDIR=path/to/libstarpu/lib #+end_src - note: BLAS and LAPACK detection can be tedious so that we provide a verbose mode you can set *-DBLAS_VERBOSE=ON* or *-DLAPACK_VERBOSE=ON* to enable it 3) detection with custom environment variables: all variables like _DIR, _INCDIR, _LIBDIR can be set as environment variables instead of CMake options, there will be read 4) using [[https://www.freedesktop.org/wiki/Software/pkg-config/][pkg-config]] for libraries that provide .pc files - update your *PKG_CONFIG_PATH* to the paths where to find .pc files of installed external libraries like hwloc, starpu, some blas/lapack, etc