faq: small updates

68845d99 · Nathalie Furmento · cc45e5fe · 68845d99
Commit 68845d99 authored 7 months ago by Nathalie Furmento
--- a/content/pages/faq.md
+++ b/content/pages/faq.md
@@ -10,42 +10,48 @@ attribute: 3
 # Getting started with StarPU and Chameleon

 * [StarPU](https://starpu.gitlabpages.inria.fr/): task-based runtime system
-* [Chameleon](https://solverstack.gitlabpages.inria.fr/chameleon/): dense linear algebra library, built on top of StarPU. Provides benchmarks of linear algebra kernels, very useful!
+* [Chameleon](https://solverstack.gitlabpages.inria.fr/chameleon/):
+  dense linear algebra library, built on top of StarPU.

-Both are available as Guix (in the [Guix-HPC channel](https://gitlab.inria.fr/guix-hpc/guix-hpc)) or Spack packages, and usually available as modules on some clusters (well, maybe only PlaFRIM).
+Both are available as Guix (in the [Guix-HPC
+channel](https://gitlab.inria.fr/guix-hpc/guix-hpc)) or Spack
+packages.

 ## Building StarPU

 ```sh
-sudo apt install libtool-bin libhwloc-dev libmkl-dev pkg-config # and probably other I already have installed
-git clone git@gitlab.inria.fr:starpu/starpu.git # or https://gitlab.inria.fr/starpu/starpu.git if you don't have a gitlab.inria.fr account with registered SSH key
+sudo apt install libtool-bin libhwloc-dev libmkl-dev pkg-config
+git clone https://gitlab.inria.fr/starpu/starpu.git
 cd starpu
 ./autogen.sh
 mkdir build
 cd build
-../configure --prefix=$HOME/dev/builds/starpu --disable-opencl --disable-cuda --disable-fortran # adapt to your usecase, see https://files.inria.fr/starpu/testing/master/doc/html/CompilationConfiguration.html
+../configure --prefix=$HOME/dev/builds/starpu --disable-opencl --disable-cuda --disable-fortran
+# see https://files.inria.fr/starpu/testing/master/doc/html/CompilationConfiguration.html
 make -j && make -j install
 ```

-Adjust environment variables (in your .bash_profile / ...):
+Adjust environment variables (for example in your `.bash_profile`):
 ```sh
 export PATH=$HOME/dev/builds/starpu/bin:${PATH}
 export LD_LIBRARY_PATH=$HOME/dev/builds/starpu/lib/:${LD_LIBRARY_PATH}
 export PKG_CONFIG_PATH=$HOME/dev/builds/starpu/lib/pkgconfig:${PKG_CONFIG_PATH}
 ```

-After sourcing your .bash_profile, you should be able to execute:
+After sourcing `.bash_profile`, you should be able to execute:
 ```sh
 starpu_machine_display
 ```
 Which shows which hardware is available on your local machine.

+Full information on how to build StarPU is available [here](https://files.inria.fr/starpu/doc/html_web_installation/)
+

 ## Building Chameleon

 ```sh
-sudo apt install cmake libmkl-dev # and probably other I already have installed
-git clone --recurse-submodules git@gitlab.inria.fr:solverstack/chameleon.git # or https://gitlab.inria.fr/solverstack/chameleon.git
+sudo apt install cmake libmkl-dev
+git clone --recurse-submodules https://gitlab.inria.fr/solverstack/chameleon.git
 cd chameleon
 mkdir build
 cd build
@@ -59,34 +65,49 @@ $HOME/dev/builds/chameleon/bin/chameleon_stesting -o potrf -H # should print som

 StarPU should have detected MPI during its building.

-For Chameleon, you have to add the options `-DCHAMELEON_USE_MPI=ON -DCHAMELEON_USE_MPI_DATATYPES=ON` to the cmake command line and build again.
+For Chameleon, you have to add the options `-DCHAMELEON_USE_MPI=ON
+-DCHAMELEON_USE_MPI_DATATYPES=ON` to the `cmake` command line and build
+again.
+
+The common way of using distributed StarPU is to launch one MPI/StarPU
+process per compute node, and then StarPU takes care of feeding all
+available cores with task. You can run:

-The common way of using distributed StarPU is to launch one MPI/StarPU process per compute node, and then StarPU takes care of feeding all available cores with task. You can do:
 ```sh
 mpirun -np 4 $HOME/dev/builds/chameleon/bin/chameleon_stesting -o potrf -H
 ```
-This will execute a Cholesky decomposition (`potrf`) with 4 MPI processes (`-np 4`) and presents results in a human-readable way (`-H`; for a CSV-like output, you can omit this option).

-You can measure performance of different matrix size with the option `-n 3200:32000:3200` (from matrix size 3200 to 32000 with a step of 3200).
+This will execute a Cholesky decomposition (`potrf`) with 4 MPI
+processes (`-np 4`) and presents results in a human-readable way
+(`-H`; for a CSV-like output, you can omit this option).
+
+You can measure performance of different matrix size with the option
+`-n 3200:32000:3200` (from matrix size 3200 to 32000 with a step of
+3200).

 You can do several iteration of the same matrix size with `--niter 2`.

 ## Basic performance tuning

-A good matrix distribution is square 2D-block-cyclic, for this add `-P x` where `x` should be (close to) the square root of the number of MPI processes (ie, you should use a square number of compute nodes).
+A good matrix distribution is square 2D-block-cyclic, for this add `-P
+x` where `x` should be (close to) the square root of the number of MPI
+processes (ie, you should use a square number of compute nodes).

 To get better results, you should bind the main thread:
 ```sh
 export STARPU_MAIN_THREAD_BIND=1
 ```
-Set the number of workers (CPU cores executing task) to the number of cores available on the compute node minus one:
+Set the number of workers (CPU cores executing task) to the number of
+cores available on the compute node minus one:
 ```sh
 export STARPU_NCPU=15
 ```

 You should not use hyperthreads.

-To know what is the good matrix size range, just execute with sizes, let's say, `3200:50000:3200`, plot the obtained Gflop/s and see with which size you reach the plateau.
+To know what is the good matrix size range, just execute with sizes,
+let's say, `3200:50000:3200`, plot the obtained Gflop/s and see with
+which size you reach the plateau.

 ## Misc