Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 68845d99 authored by Nathalie Furmento's avatar Nathalie Furmento
Browse files

faq: small updates

parent cc45e5fe
No related branches found
No related tags found
No related merge requests found
Pipeline #1067550 passed
......@@ -10,42 +10,48 @@ attribute: 3
# Getting started with StarPU and Chameleon
* [StarPU](https://starpu.gitlabpages.inria.fr/): task-based runtime system
* [Chameleon](https://solverstack.gitlabpages.inria.fr/chameleon/): dense linear algebra library, built on top of StarPU. Provides benchmarks of linear algebra kernels, very useful!
* [Chameleon](https://solverstack.gitlabpages.inria.fr/chameleon/):
dense linear algebra library, built on top of StarPU.
Both are available as Guix (in the [Guix-HPC channel](https://gitlab.inria.fr/guix-hpc/guix-hpc)) or Spack packages, and usually available as modules on some clusters (well, maybe only PlaFRIM).
Both are available as Guix (in the [Guix-HPC
channel](https://gitlab.inria.fr/guix-hpc/guix-hpc)) or Spack
packages.
## Building StarPU
```sh
sudo apt install libtool-bin libhwloc-dev libmkl-dev pkg-config # and probably other I already have installed
git clone git@gitlab.inria.fr:starpu/starpu.git # or https://gitlab.inria.fr/starpu/starpu.git if you don't have a gitlab.inria.fr account with registered SSH key
sudo apt install libtool-bin libhwloc-dev libmkl-dev pkg-config
git clone https://gitlab.inria.fr/starpu/starpu.git
cd starpu
./autogen.sh
mkdir build
cd build
../configure --prefix=$HOME/dev/builds/starpu --disable-opencl --disable-cuda --disable-fortran # adapt to your usecase, see https://files.inria.fr/starpu/testing/master/doc/html/CompilationConfiguration.html
../configure --prefix=$HOME/dev/builds/starpu --disable-opencl --disable-cuda --disable-fortran
# see https://files.inria.fr/starpu/testing/master/doc/html/CompilationConfiguration.html
make -j && make -j install
```
Adjust environment variables (in your .bash_profile / ...):
Adjust environment variables (for example in your `.bash_profile`):
```sh
export PATH=$HOME/dev/builds/starpu/bin:${PATH}
export LD_LIBRARY_PATH=$HOME/dev/builds/starpu/lib/:${LD_LIBRARY_PATH}
export PKG_CONFIG_PATH=$HOME/dev/builds/starpu/lib/pkgconfig:${PKG_CONFIG_PATH}
```
After sourcing your .bash_profile, you should be able to execute:
After sourcing `.bash_profile`, you should be able to execute:
```sh
starpu_machine_display
```
Which shows which hardware is available on your local machine.
Full information on how to build StarPU is available [here](https://files.inria.fr/starpu/doc/html_web_installation/)
## Building Chameleon
```sh
sudo apt install cmake libmkl-dev # and probably other I already have installed
git clone --recurse-submodules git@gitlab.inria.fr:solverstack/chameleon.git # or https://gitlab.inria.fr/solverstack/chameleon.git
sudo apt install cmake libmkl-dev
git clone --recurse-submodules https://gitlab.inria.fr/solverstack/chameleon.git
cd chameleon
mkdir build
cd build
......@@ -59,34 +65,49 @@ $HOME/dev/builds/chameleon/bin/chameleon_stesting -o potrf -H # should print som
StarPU should have detected MPI during its building.
For Chameleon, you have to add the options `-DCHAMELEON_USE_MPI=ON -DCHAMELEON_USE_MPI_DATATYPES=ON` to the cmake command line and build again.
For Chameleon, you have to add the options `-DCHAMELEON_USE_MPI=ON
-DCHAMELEON_USE_MPI_DATATYPES=ON` to the `cmake` command line and build
again.
The common way of using distributed StarPU is to launch one MPI/StarPU
process per compute node, and then StarPU takes care of feeding all
available cores with task. You can run:
The common way of using distributed StarPU is to launch one MPI/StarPU process per compute node, and then StarPU takes care of feeding all available cores with task. You can do:
```sh
mpirun -np 4 $HOME/dev/builds/chameleon/bin/chameleon_stesting -o potrf -H
```
This will execute a Cholesky decomposition (`potrf`) with 4 MPI processes (`-np 4`) and presents results in a human-readable way (`-H`; for a CSV-like output, you can omit this option).
You can measure performance of different matrix size with the option `-n 3200:32000:3200` (from matrix size 3200 to 32000 with a step of 3200).
This will execute a Cholesky decomposition (`potrf`) with 4 MPI
processes (`-np 4`) and presents results in a human-readable way
(`-H`; for a CSV-like output, you can omit this option).
You can measure performance of different matrix size with the option
`-n 3200:32000:3200` (from matrix size 3200 to 32000 with a step of
3200).
You can do several iteration of the same matrix size with `--niter 2`.
## Basic performance tuning
A good matrix distribution is square 2D-block-cyclic, for this add `-P x` where `x` should be (close to) the square root of the number of MPI processes (ie, you should use a square number of compute nodes).
A good matrix distribution is square 2D-block-cyclic, for this add `-P
x` where `x` should be (close to) the square root of the number of MPI
processes (ie, you should use a square number of compute nodes).
To get better results, you should bind the main thread:
```sh
export STARPU_MAIN_THREAD_BIND=1
```
Set the number of workers (CPU cores executing task) to the number of cores available on the compute node minus one:
Set the number of workers (CPU cores executing task) to the number of
cores available on the compute node minus one:
```sh
export STARPU_NCPU=15
```
You should not use hyperthreads.
To know what is the good matrix size range, just execute with sizes, let's say, `3200:50000:3200`, plot the obtained Gflop/s and see with which size you reach the plateau.
To know what is the good matrix size range, just execute with sizes,
let's say, `3200:50000:3200`, plot the obtained Gflop/s and see with
which size you reach the plateau.
## Misc
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment