scalfmm.md 22.2 KB
Newer Older
Berenger Bramas's avatar
Berenger Bramas committed
1 2 3
ScalFMM with StarPU+CUDA
========================

Berenger Bramas's avatar
Berenger Bramas committed
4 5 6
In this tutorial, we provide the commands to install ScalFMM and the needed tools in order to compute parallel efficiencies.
We first show how to obtain the homogeneous efficencies and then the heterogeneous ones (not done yet).

Berenger Bramas's avatar
Berenger Bramas committed
7 8
## Installing the libraries

Berenger Bramas's avatar
Berenger Bramas committed
9
For some installation steps, we provide a "valid-if" test which shows if the previous command has been done correctly or not.
Berenger Bramas's avatar
Berenger Bramas committed
10
In case of success `STEP-OK` will be print-out.
Berenger Bramas's avatar
Berenger Bramas committed
11
In addition, if a library is already installed on the system, it is possible to set the output variables directly and test with the "valid-if" command if it will work.
Berenger Bramas's avatar
Berenger Bramas committed
12

Berenger Bramas's avatar
Berenger Bramas committed
13 14 15
It is possible to follow these steps only to compile ScalFMM above StarPU and so we marked the installation of execution-trace tools as __Optional__.
However, we higly recommended to install them and to follow all the steps since they let have the efficiencies.
But if one wants to execute without any overhead, it might need to remove the usage of FXT.
Berenger Bramas's avatar
Berenger Bramas committed
16 17 18 19 20 21 22 23 24 25 26 27

### Pre-requiste:
In order to follow this tutorial, it is needed to have the following applications installed:

* autoconf (>= 2.69)
* gawk (Awk >= 4.0.1)
* make (>= 3.81) 
* cmake (>= 3.2.2)
* gcc/g++ (>= 4.9) and the gcc/g++ names should point to the correct binaries
* BLAS/LAPACK (The configure of ScalFMM is different if the MKL is used or not, but with the MKL it is recommended to set environment variable `MKLROOT`)
* CUDA (>= 7) and `CUDA_PATH` must be set. In our case, `CUDA_PATH=/usr/local/cuda-7.5/`
* __Optional__ Vite (from `sudo apt-get install vite` or see [http://vite.gforge.inria.fr/download.php](http://vite.gforge.inria.fr/download.php))
Berenger Bramas's avatar
Berenger Bramas committed
28
*  __Optional__ Qt5 library to be able to change the colors of the execution traces in order to visualize the different FMM operators
Berenger Bramas's avatar
Berenger Bramas committed
29
* gnuplot to generate the figures
Berenger Bramas's avatar
Berenger Bramas committed
30

Berenger Bramas's avatar
Berenger Bramas committed
31
> [Remark] Some installations of CUDA does not have libcuda file.
Berenger Bramas's avatar
Berenger Bramas committed
32 33 34 35
> In this case, one needs to create a link : `sudo ln /usr/local/cuda-7.5/lib64/libcudart.so /usr/local/cuda-7.5/lib64/libcuda.so`

> [Plafrim-Developers] 
>
Berenger Bramas's avatar
Berenger Bramas committed
36 37 38
> For those who use this tutorial on Plafrim (or a similar cluster), we provide extra informations.
>
> To allocate an heterogeneous node : `salloc -N 1 --time=03:00:00 --exclusive -p court_sirocco -CHaswell --gres=gpu:4 -x sirocco06`
Berenger Bramas's avatar
Berenger Bramas committed
39
> 
Berenger Bramas's avatar
Berenger Bramas committed
40
> Then, find it using `squeue` and access it by `ssh`.
Berenger Bramas's avatar
Berenger Bramas committed
41
>
Berenger Bramas's avatar
Berenger Bramas committed
42
> We have run this tutorial with the modules : `module load compiler/gcc/4.9.2 cuda75/toolkit/7.5.18 intel/mkl/64/11.2/2016.0.0 build/cmake/3.2.1`
Berenger Bramas's avatar
Berenger Bramas committed
43 44 45

### Working directory

Berenger Bramas's avatar
Berenger Bramas committed
46
The variable `SCALFMM_TEST_DIR` is used to specify the working directory where all the tools are going to be installed:
Berenger Bramas's avatar
Berenger Bramas committed
47
```bash
Berenger Bramas's avatar
Berenger Bramas committed
48
export SCALFMM_TEST_DIR=~/scalfmm_test   
Berenger Bramas's avatar
Berenger Bramas committed
49 50 51
cd $SCALFMM_TEST_DIR
```

Berenger Bramas's avatar
Berenger Bramas committed
52
In order to be able to stop the tutorial in the middle and restart later, we will register the variables in a file that should be source to restart later:
Berenger Bramas's avatar
Berenger Bramas committed
53
```bash
Berenger Bramas's avatar
Berenger Bramas committed
54 55 56
# function scalfmmRegisterVariable() { echo "export $1=${!1}" >> "$SCALFMM_TEST_DIR/environment.source"; }
echo "function scalfmmRegisterVariable() { echo \"export \$1=\${!1}\" >> \"$SCALFMM_TEST_DIR/environment.source\"; }" > "$SCALFMM_TEST_DIR/environment.source"
source "$SCALFMM_TEST_DIR/environment.source"
Berenger Bramas's avatar
Berenger Bramas committed
57 58 59
```

*Output variables:* `scalfmmRegisterVariable SCALFMM_TEST_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
60 61 62 63 64 65 66 67

Valid-if
```bash
if [[ -n $SCALFMM_TEST_DIR ]] && [[ -d $SCALFMM_TEST_DIR ]] ; then
   echo “STEP-OK”
fi
```

Berenger Bramas's avatar
Berenger Bramas committed
68 69 70
- Restarting the tutorial

To restart the tutorial, one needs to re-define the working directory and to source the save file before to resume:
Berenger Bramas's avatar
Berenger Bramas committed
71
```bash
Berenger Bramas's avatar
Berenger Bramas committed
72 73 74
export SCALFMM_TEST_DIR=~/scalfmm_test
if [[ ! -d $SCALFMM_TEST_DIR ]] ; then
	mkdir $SCALFMM_TEST_DIR
Berenger Bramas's avatar
Berenger Bramas committed
75 76
else
	source "$SCALFMM_TEST_DIR/environment.source"
Berenger Bramas's avatar
Berenger Bramas committed
77 78 79
fi    
cd $SCALFMM_TEST_DIR
```
Berenger Bramas's avatar
Berenger Bramas committed
80

Berenger Bramas's avatar
Berenger Bramas committed
81
### Downloading the Packages (in Advance)
Berenger Bramas's avatar
Berenger Bramas committed
82

Berenger Bramas's avatar
Berenger Bramas committed
83
If the computational node does not have access to internet, we provide a command to download the needed packages (otherwise the next commands still include just in time download):
Berenger Bramas's avatar
Berenger Bramas committed
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
```bash
cd $SCALFMM_TEST_DIR
wget https://www.open-mpi.org/software/hwloc/v1.11/downloads/hwloc-1.11.2.tar.gz
wget http://download.savannah.gnu.org/releases/fkt/fxt-0.2.11.tar.gz # Optional
wget http://www.fftw.org/fftw-3.3.4.tar.gz
svn export svn://scm.gforge.inria.fr/svnroot/starpu/trunk starpu
git clone --depth=1 https://scm.gforge.inria.fr/anonscm/git/scalfmm-public/scalfmm-public.git
```

### HWLOC
```bash
cd $SCALFMM_TEST_DIR
if [[ ! -f hwloc-1.11.2.tar.gz ]] ; then
    wget https://www.open-mpi.org/software/hwloc/v1.11/downloads/hwloc-1.11.2.tar.gz
fi
tar xvf hwloc-1.11.2.tar.gz
cd hwloc-1.11.2/
Berenger Bramas's avatar
Berenger Bramas committed
101
export SCALFMM_HWLOC_DIR=$SCALFMM_TEST_DIR/hwlocinstall
Berenger Bramas's avatar
Berenger Bramas committed
102 103 104 105
./configure --prefix=$SCALFMM_HWLOC_DIR
make install
```

Berenger Bramas's avatar
Berenger Bramas committed
106
*Output variables:* `scalfmmRegisterVariable SCALFMM_HWLOC_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
107 108 109 110

Valid-if:
```bash
if [[ -n $SCALFMM_HWLOC_DIR ]] && [[ -d $SCALFMM_HWLOC_DIR/lib/ ]] && [[ -f  $SCALFMM_HWLOC_DIR/lib/libhwloc.so ]]; then
Berenger Bramas's avatar
Berenger Bramas committed
111
   echo "STEP-OK"
Berenger Bramas's avatar
Berenger Bramas committed
112 113 114 115 116 117 118 119 120 121 122
fi
```

### FXT (__Optional__)
```bash
cd $SCALFMM_TEST_DIR
if [[ ! -f fxt-0.2.11.tar.gz ]] ; then
    wget http://download.savannah.gnu.org/releases/fkt/fxt-0.2.11.tar.gz
fi
tar xvf fxt-0.2.11.tar.gz
cd fxt-0.2.11/
Berenger Bramas's avatar
Berenger Bramas committed
123
export SCALFMM_FXT_DIR=$SCALFMM_TEST_DIR/fxtinstall
Berenger Bramas's avatar
Berenger Bramas committed
124 125 126 127
./configure --prefix=$SCALFMM_FXT_DIR
make install
```

Berenger Bramas's avatar
Berenger Bramas committed
128
*Output variables:* `scalfmmRegisterVariable SCALFMM_FXT_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
129 130 131 132

Valid-if:
```bash
if [[ -n $SCALFMM_FXT_DIR ]] && [[ -d $SCALFMM_FXT_DIR/lib/ ]] && [[ -f  $SCALFMM_FXT_DIR/lib/libfxt.so ]]; then
Berenger Bramas's avatar
Berenger Bramas committed
133
   echo "STEP-OK"
Berenger Bramas's avatar
Berenger Bramas committed
134 135 136
fi
```

Berenger Bramas's avatar
Berenger Bramas committed
137 138
### FFTW (If No MKL-FFT)
For those who do not use MKL FFT interface, they have to install FFTW (float/double):
Berenger Bramas's avatar
Berenger Bramas committed
139 140 141 142 143 144 145
```bash
cd $SCALFMM_TEST_DIR
if [[ ! -f fftw-3.3.4.tar.gz ]] ; then
    wget http://www.fftw.org/fftw-3.3.4.tar.gz
fi    
tar xvf fftw-3.3.4.tar.gz
cd fftw-3.3.4/
Berenger Bramas's avatar
Berenger Bramas committed
146
export SCALFMM_FFTW_DIR=$SCALFMM_TEST_DIR/fftinstall
Berenger Bramas's avatar
Berenger Bramas committed
147 148 149 150 151 152
./configure --prefix=$SCALFMM_FFTW_DIR
make install
./configure --prefix=$SCALFMM_FFTW_DIR --enable-float
make install
```

Berenger Bramas's avatar
Berenger Bramas committed
153
*Output variables:* `scalfmmRegisterVariable SCALFMM_FFTW_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
154 155 156 157

Valid-if:
```bash
if [[ -n $SCALFMM_FFTW_DIR ]] && [[ -d $SCALFMM_FFTW_DIR/lib/ ]] && [[ -f  $SCALFMM_FFTW_DIR/lib/libfftw3.a ]] && [[ -f  $SCALFMM_FFTW_DIR/lib/libfftw3f.a ]]; then
Berenger Bramas's avatar
Berenger Bramas committed
158
   echo "STEP-OK"
Berenger Bramas's avatar
Berenger Bramas committed
159 160 161 162 163 164 165 166 167 168
fi
```

### StarPU
```bash
cd $SCALFMM_TEST_DIR
if [[ ! -d starpu ]] ; then
	svn export svn://scm.gforge.inria.fr/svnroot/starpu/trunk starpu
fi    
cd starpu/
Berenger Bramas's avatar
Berenger Bramas committed
169
export SCALFMM_STARPU_DIR=$SCALFMM_TEST_DIR/starpuinstall
Berenger Bramas's avatar
Berenger Bramas committed
170 171 172 173 174 175
./autogen.sh
./configure --prefix=$SCALFMM_STARPU_DIR --with-fxt=$SCALFMM_FXT_DIR --with-hwloc=$SCALFMM_HWLOC_DIR --with-cuda-dir=$CUDA_PATH --disable-opencl
make install
```
> __Optional__ In case you do not want to use trace (FXT) please remove the `--with-fxt=$SCALFMM_FXT_DIR` parameter from the command

Berenger Bramas's avatar
Berenger Bramas committed
176
*Output variables:* `scalfmmRegisterVariable SCALFMM_STARPU_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
177 178 179 180

Valid-if:
```bash
if [[ -n $SCALFMM_STARPU_DIR ]] && [[ -d $SCALFMM_STARPU_DIR/lib/ ]] && [[ -f  $SCALFMM_STARPU_DIR/lib/libstarpu.so ]] ; then
Berenger Bramas's avatar
Berenger Bramas committed
181
   echo "STEP-OK"
Berenger Bramas's avatar
Berenger Bramas committed
182 183 184 185 186 187 188 189 190 191 192 193
fi
```

### ScalFMM

#### Configure
+ Getting the source from the last commit:
```bash
cd $SCALFMM_TEST_DIR
if [[ ! -d scalfmm-public ]] ; then
    git clone --depth=1 https://scm.gforge.inria.fr/anonscm/git/scalfmm-public/scalfmm-public.git
fi    
Berenger Bramas's avatar
Berenger Bramas committed
194 195
cd scalfmm-public/
export SCALFMM_SOURCE_DIR=`pwd`
Berenger Bramas's avatar
Berenger Bramas committed
196
cd Build/
Berenger Bramas's avatar
Berenger Bramas committed
197 198 199
export SCALFMM_BUILD_DIR=`pwd`
```

Berenger Bramas's avatar
Berenger Bramas committed
200
*Output variables:* `scalfmmRegisterVariable SCALFMM_BUILD_DIR` `scalfmmRegisterVariable SCALFMM_SOURCE_DIR`
Berenger Bramas's avatar
Berenger Bramas committed
201 202 203

+ Configure (No MKL):
```bash
Berenger Bramas's avatar
Berenger Bramas committed
204 205 206 207 208 209 210
cmake .. -DSCALFMM_BUILD_DEBUG=OFF -DSCALFMM_USE_MPI=OFF \
               -DSCALFMM_BUILD_TESTS=ON -DSCALFMM_BUILD_UTESTS=OFF \
               -DSCALFMM_USE_BLAS=ON -DSCALFMM_USE_MKL_AS_BLAS=OFF \
               -DSCALFMM_USE_LOG=ON -DSCALFMM_USE_STARPU=ON \
               -DSCALFMM_USE_CUDA=ON -DSCALFMM_USE_OPENCL=OFF \
               -DHWLOC_DIR=$SCALFMM_HWLOC_DIR -DSTARPU_DIR=$SCALFMM_STARPU_DIR \
               -DSCALFMM_USE_FFT=ON -DFFT_DIR=$SCALFMM_FFT_DIR
Berenger Bramas's avatar
Berenger Bramas committed
211 212 213
```
+ Configure (MKL BLAS/LAPACK and FFTW):
```bash
Berenger Bramas's avatar
Berenger Bramas committed
214 215 216 217 218 219 220
cmake .. -DSCALFMM_BUILD_DEBUG=OFF -DSCALFMM_USE_MPI=OFF \
               -DSCALFMM_BUILD_TESTS=ON -DSCALFMM_BUILD_UTESTS=OFF \
               -DSCALFMM_USE_BLAS=ON -DSCALFMM_USE_MKL_AS_BLAS=ON \
               -DSCALFMM_USE_LOG=ON -DSCALFMM_USE_STARPU=ON \
               -DSCALFMM_USE_CUDA=ON -DSCALFMM_USE_OPENCL=OFF \
               -DHWLOC_DIR=$SCALFMM_HWLOC_DIR -DSTARPU_DIR=$SCALFMM_STARPU_DIR \
               -DSCALFMM_USE_FFT=ON -DFFT_DIR=$SCALFMM_FFT_DIR
Berenger Bramas's avatar
Berenger Bramas committed
221 222 223 224 225 226
```
+ Configure (MKL BLAS/LAPACK/FFT and No FFTW):

> [Plafrim-Developers] Should use that one

```bash
Berenger Bramas's avatar
Berenger Bramas committed
227 228 229 230 231 232 233
cmake .. -DSCALFMM_BUILD_DEBUG=OFF -DSCALFMM_USE_MPI=OFF \
               -DSCALFMM_BUILD_TESTS=ON -DSCALFMM_BUILD_UTESTS=OFF \
               -DSCALFMM_USE_BLAS=ON -DSCALFMM_USE_MKL_AS_BLAS=ON \
               -DSCALFMM_USE_LOG=ON -DSCALFMM_USE_STARPU=ON \
               -DSCALFMM_USE_CUDA=ON -DSCALFMM_USE_OPENCL=OFF \
               -DHWLOC_DIR=$SCALFMM_HWLOC_DIR -DSTARPU_DIR=$SCALFMM_STARPU_DIR \
               -DSCALFMM_USE_FFT=ON -DSCALFMM_USE_MKL_AS_FFTW=ON
Berenger Bramas's avatar
Berenger Bramas committed
234 235 236
```

Valid-if:
Berenger Bramas's avatar
Berenger Bramas committed
237
```bash
Berenger Bramas's avatar
Berenger Bramas committed
238
cmake .. ; if [[ "$?" == "0" ]] ; then echo "STEP-OK" ; fi
Berenger Bramas's avatar
Berenger Bramas committed
239 240 241 242 243 244 245 246 247 248
```

#### Build

```bash
cd $SCALFMM_BUILD_DIR
make testBlockedUnifCudaBench
```

Valid-if:
Berenger Bramas's avatar
Berenger Bramas committed
249
```bash
Berenger Bramas's avatar
Berenger Bramas committed
250
ls ./Tests/Release/testBlockedUnifCudaBench ; if [[ "$?" == "0" ]] ; then echo "STEP-OK" ; fi
Berenger Bramas's avatar
Berenger Bramas committed
251 252
```

Berenger Bramas's avatar
Berenger Bramas committed
253
#### First Execution
Berenger Bramas's avatar
Berenger Bramas committed
254

Berenger Bramas's avatar
Berenger Bramas committed
255 256
In this section we compute a simulation and look at the resulting trace.
ScalFMM binary parameters and descriptions:
Berenger Bramas's avatar
Berenger Bramas committed
257 258 259 260 261 262 263 264 265 266 267 268 269 270

* Passing `--help` as parameter provide the possible/valid parameters
* Simulation properties are choosen by :
  * `-h` : height of the tree
  * `-bs` : granularity/size of the group
  * `-nb` : number of particles generated
* Execution properties are choosen by the StarPU environment variables :
  * `STARPU_NCPUS` : the number of CPU workers
  * `STARPU_NCUDA` : the number of GPU workers (for heterogeneous binary)
* By default the application will not compare the FMM interactions against the direct method (which is N^2) and so it is recommended to avoid the validation for large test cases. But to get the accuracy one must pass the parameter `-validation`
* `-p2p-m2l-cuda-only` : to compute the P2P and the M2L only on GPU (the rest on the CPU)

Examples:

Berenger Bramas's avatar
Berenger Bramas committed
271
```bash
Berenger Bramas's avatar
Berenger Bramas committed
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
export STARPU_NCPUS=12
export STARPU_NCUDA=2
./Tests/Release/testBlockedUnifCudaBench -nb 30000000 -h 7 -bs 800
```

Last part of the output should be:
```bash
	Start FGroupTaskStarPUAlgorithm
		 directPass in 0.0406482s
			 inblock  in 0.000780428s
			 outblock in 0.0398674s
		 bottomPass in 0.00586269s
		 upwardPass in 0.00265723s
		 transferPass in 0.00323571s
			 inblock in  0.000124817s
			 outblock in 0.00298331s
		 downardPass in 0.00257975s
		 transferPass in 0.0652285s
			 inblock in  0.00164774s
			 outblock in 0.0635799s
		 L2P in 0.0115733s
		 Submitting the tasks took 0.139101s
		 Moving data to the host took 0.0578765s
@EXEC TIME = 14.6321s
Berenger Bramas's avatar
Berenger Bramas committed
296 297 298 299 300 301 302 303
```

+ Visualize the execution trace (__Optional__)

Convert the fxt file
```bash
$SCALFMM_STARPU_DIR/bin/starpu_fxt_tool -i "/tmp/prof_file_"$USER"_0"
```
Berenger Bramas's avatar
Berenger Bramas committed
304
Then visualize the output with `vite` (maybe by copying the paje.trace file locally)
Berenger Bramas's avatar
Berenger Bramas committed
305 306 307 308
```bash
vite ./paje.trace
```

Berenger Bramas's avatar
Berenger Bramas committed
309 310
Should be like:
![Trace](trace-example.png)
Berenger Bramas's avatar
Berenger Bramas committed
311

Berenger Bramas's avatar
Berenger Bramas committed
312
We can convert the color of the trace by (requiere Qt5 library):
Berenger Bramas's avatar
Berenger Bramas committed
313 314 315 316 317 318

```bash
$SCALFMM_SOURCE_DIR/Addons/BenchEfficiency/pajecolor paje.trace $SCALFMM_SOURCE_DIR/Addons/BenchEfficiency/paintmodel.fmm.colors
vite ./paje.trace.painted
```

Berenger Bramas's avatar
Berenger Bramas committed
319 320
Should be like: 
![Trace](trace-example-colors.png)
Berenger Bramas's avatar
Berenger Bramas committed
321

Berenger Bramas's avatar
Berenger Bramas committed
322 323 324 325 326 327 328 329 330
+ Get execution times

```bash
python $SCALFMM_STARPU_DIR/bin/starpu_trace_state_stats.py -t trace.rec
```

Should give something like:
```
"Name","Count","Type","Duration"
Berenger Bramas's avatar
Berenger Bramas committed
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361
"Initializing",14,"Runtime",7153.096196
"Overhead",57010,"Runtime",376.473463
"Idle",14355,"Other",12.815899
"Scheduling",28441,"Runtime",238.367394
"Sleeping",610,"Other",13786.513208
"FetchingInput",14341,"Runtime",13918.805814
"execute_on_all_wrapper",30,"Task",21.288802
"Executing",414,"Runtime",26852.864578
"PushingOutput",14341,"Runtime",284.96123
"P2P-out",3846,"Task",60378.266619
"Callback",13559,"Runtime",4.210633
"P2P",328,"Task",15383.426991
"M2L-level-5",41,"Task",2354.702554
"M2L-level-6",328,"Task",18349.915495
"Deinitializing",14,"Runtime",109.87483
"M2L-level-4",6,"Task",275.088295
"P2M",328,"Task",11312.022842
"M2M-level-5",328,"Task",829.9055
"M2M-level-4",41,"Task",93.130498
"M2L-out-level-5",638,"Task",1914.900053
"M2M-level-3",6,"Task",11.053067
"M2M-level-2",1,"Task",1.363157
"M2L-out-level-4",22,"Task",159.580457
"L2L-level-4",41,"Task",84.554065
"L2L-level-5",328,"Task",1087.717767
"M2L-out-level-6",7692,"Task",18322.518045
"L2P",328,"Task",27146.256793
"M2L-level-2",1,"Task",2.661235
"L2L-level-3",6,"Task",11.346978
"M2L-level-3",1,"Task",47.612555
"L2L-level-2",1,"Task",1.471873
Berenger Bramas's avatar
Berenger Bramas committed
362 363
```

Berenger Bramas's avatar
Berenger Bramas committed
364
Most of the script are in the addon directories
Berenger Bramas's avatar
Berenger Bramas committed
365
```bash
Berenger Bramas's avatar
Berenger Bramas committed
366 367 368
export SCALFMM_AB=$SCALFMM_SOURCE_DIR/Addons/BenchEfficiency/
```

Berenger Bramas's avatar
Berenger Bramas committed
369 370
*Output variable:* `scalfmmRegisterVariable SCALFMM_AB`

Berenger Bramas's avatar
Berenger Bramas committed
371 372 373 374 375
## Homogeneous Efficiencies

Here we compute the efficiencies for a given test case on CPU only.

Go in the build dir and create output dir
Berenger Bramas's avatar
Berenger Bramas committed
376
```bash
Berenger Bramas's avatar
Berenger Bramas committed
377
cd $SCALFMM_BUILD_DIR
Berenger Bramas's avatar
Berenger Bramas committed
378 379
export SCALFMM_RES_DIR=$SCALFMM_BUILD_DIR/homogeneous
mkdir $SCALFMM_RES_DIR
Berenger Bramas's avatar
Berenger Bramas committed
380
```
Berenger Bramas's avatar
Berenger Bramas committed
381
*Output variable:* `scalfmmRegisterVariable SCALFMM_RES_DIR` 
Berenger Bramas's avatar
Berenger Bramas committed
382

Berenger Bramas's avatar
Berenger Bramas committed
383 384
Set up the configuration variables:
```bash
Berenger Bramas's avatar
Berenger Bramas committed
385 386 387 388 389
export SCALFMM_NB=10000000
export SCALFMM_H=7
export SCALFMM_MIN_BS=100
export SCALFMM_MAX_BS=10000
export SCALFMM_MAX_NB_CPU=24
Berenger Bramas's avatar
Berenger Bramas committed
390 391 392 393
```

Find best granularity in sequential and in parallel:
```bash
Berenger Bramas's avatar
Berenger Bramas committed
394 395 396
export STARPU_NCPUS=1
export STARPU_NCUDA=0
export SCALFMM_BS_CPU_SEQ=`$SCALFMM_AB/scalfmmFindBs.sh "./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs" $SCALFMM_MIN_BS $SCALFMM_MAX_BS | $SCALFMM_AB/scalfmmExtractKey.sh "@BEST BS" `
Berenger Bramas's avatar
Berenger Bramas committed
397 398 399 400
if [[ `which gnuplot | wc -l` == "1" ]] ;  then
    gnuplot -e "filename='seq-bs-search'" $SCALFMM_AB/scalfmmFindBs.gplot
fi

Berenger Bramas's avatar
Berenger Bramas committed
401 402 403
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=0
export SCALFMM_BS_CPU_PAR=`$SCALFMM_AB/scalfmmFindBs.sh "./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs" $SCALFMM_MIN_BS $SCALFMM_MAX_BS | $SCALFMM_AB/scalfmm_extract_key "@BEST BS" `
Berenger Bramas's avatar
Berenger Bramas committed
404 405 406 407
if [[ `which gnuplot | wc -l` == "1" ]] ;  then
    gnuplot -e "filename='par-bs-search'" $SCALFMM_AB/scalfmmFindBs.gplot
fi
```
Berenger Bramas's avatar
Berenger Bramas committed
408
In our case we get 9710  and 5385.
Berenger Bramas's avatar
Berenger Bramas committed
409

Berenger Bramas's avatar
Berenger Bramas committed
410
*Output variable:* `scalfmmRegisterVariable SCALFMM_BS_CPU_SEQ`  `scalfmmRegisterVariable SCALFMM_BS_CPU_PAR`
Berenger Bramas's avatar
Berenger Bramas committed
411

Berenger Bramas's avatar
Berenger Bramas committed
412 413 414 415
We can look to the work that has been done to find the best granularity:
![In sequential](seq-bs-search.png)
![In parallel](par-bs-search.png)

Berenger Bramas's avatar
Berenger Bramas committed
416

Berenger Bramas's avatar
Berenger Bramas committed
417
Then we compute the efficiency using both granulirities and keep the .rec files:
Berenger Bramas's avatar
Berenger Bramas committed
418
```bash
Berenger Bramas's avatar
Berenger Bramas committed
419 420
export SCALFMM_MAX_NB_CPU=24
export STARPU_NCUDA=0
Berenger Bramas's avatar
Berenger Bramas committed
421
source "$SCALFMM_AB/execAllHomogeneous.sh"
Berenger Bramas's avatar
Berenger Bramas committed
422 423
```

Berenger Bramas's avatar
Berenger Bramas committed
424
We should end with all the .rec files and their corresponding time files and `ls "$SCALFMM_RES_DIR"` should return something like:
Berenger Bramas's avatar
Berenger Bramas committed
425
```bash
Berenger Bramas's avatar
Berenger Bramas committed
426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441
trace-nb_10000000-h_7-bs_5385-CPU_10.rec       trace-nb_10000000-h_7-bs_5385-CPU_16.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_22.rec       trace-nb_10000000-h_7-bs_5385-CPU_5.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_10.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_17.rec       trace-nb_10000000-h_7-bs_5385-CPU_22.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_6.rec
trace-nb_10000000-h_7-bs_5385-CPU_11.rec       trace-nb_10000000-h_7-bs_5385-CPU_17.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_23.rec       trace-nb_10000000-h_7-bs_5385-CPU_6.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_11.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_18.rec       trace-nb_10000000-h_7-bs_5385-CPU_23.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_7.rec
trace-nb_10000000-h_7-bs_5385-CPU_12.rec       trace-nb_10000000-h_7-bs_5385-CPU_18.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_24.rec       trace-nb_10000000-h_7-bs_5385-CPU_7.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_12.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_19.rec       trace-nb_10000000-h_7-bs_5385-CPU_24.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_8.rec
trace-nb_10000000-h_7-bs_5385-CPU_13.rec       trace-nb_10000000-h_7-bs_5385-CPU_19.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_2.rec        trace-nb_10000000-h_7-bs_5385-CPU_8.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_13.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_1.rec        trace-nb_10000000-h_7-bs_5385-CPU_2.rec.time   trace-nb_10000000-h_7-bs_5385-CPU_9.rec
trace-nb_10000000-h_7-bs_5385-CPU_14.rec       trace-nb_10000000-h_7-bs_5385-CPU_1.rec.time   trace-nb_10000000-h_7-bs_5385-CPU_3.rec        trace-nb_10000000-h_7-bs_5385-CPU_9.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_14.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_20.rec       trace-nb_10000000-h_7-bs_5385-CPU_3.rec.time   trace-nb_10000000-h_7-bs_9710-CPU_1.rec
trace-nb_10000000-h_7-bs_5385-CPU_15.rec       trace-nb_10000000-h_7-bs_5385-CPU_20.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_4.rec        trace-nb_10000000-h_7-bs_9710-CPU_1.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_15.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_21.rec       trace-nb_10000000-h_7-bs_5385-CPU_4.rec.time
trace-nb_10000000-h_7-bs_5385-CPU_16.rec       trace-nb_10000000-h_7-bs_5385-CPU_21.rec.time  trace-nb_10000000-h_7-bs_5385-CPU_5.rec
```

We then compute the efficiencies from these files
Berenger Bramas's avatar
Berenger Bramas committed
442
```bash
Berenger Bramas's avatar
Berenger Bramas committed
443
g++ -std=c++11 $SCALFMM_AB/mergetimefile.cpp -o $SCALFMM_AB/mergetimefile.exe
Berenger Bramas's avatar
Berenger Bramas committed
444 445 446 447
$SCALFMM_AB/mergetimefile.exe \
        "$SCALFMM_RES_DIR/trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_BS_CPU_SEQ-CPU_1.rec.time" \
        "$SCALFMM_RES_DIR/trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_BS_CPU_PAR-CPU_%d.rec.time"\
         $SCALFMM_MAX_NB_CPU
Berenger Bramas's avatar
Berenger Bramas committed
448 449
```

Berenger Bramas's avatar
Berenger Bramas committed
450
We end-up with the global efficiencies (for the application) but also for the different operators.
Berenger Bramas's avatar
Berenger Bramas committed
451
```bash
Berenger Bramas's avatar
Berenger Bramas committed
452 453 454
Create global-eff.data
Create task-eff.data
Create task-gr-eff.dat
Berenger Bramas's avatar
Berenger Bramas committed
455 456 457 458
```

We can plot each of them
```bash
Berenger Bramas's avatar
Berenger Bramas committed
459 460 461
gnuplot -e "filename='global-eff'" $SCALFMM_AB/scalfmmPlotAll.gplot
gnuplot -e "filename='task-eff'" $SCALFMM_AB/scalfmmPlotAll.gplot
gnuplot -e "filename='task-gr-eff'" $SCALFMM_AB/scalfmmPlotAll.gplot
Berenger Bramas's avatar
Berenger Bramas committed
462 463
```

Berenger Bramas's avatar
Berenger Bramas committed
464 465 466 467
In our case it gives:
![global-eff](global-eff.png)
![task-eff](task-eff.png)
![task-gr-eff](task-gr-eff.png)
Berenger Bramas's avatar
Berenger Bramas committed
468

Berenger Bramas's avatar
Berenger Bramas committed
469

Berenger Bramas's avatar
Berenger Bramas committed
470
## Heterogeneous
Berenger Bramas's avatar
Berenger Bramas committed
471

Berenger Bramas's avatar
Berenger Bramas committed
472 473
__NOT FINISHED!!!!__

Berenger Bramas's avatar
Berenger Bramas committed
474 475 476 477 478 479
For test case `-nb 10000000` (10 million) and `-h 6` (height of the tree equal to 6),
we first want to know the best granularity `-bs`.

This parameter will certainly not be the same for sequential/parallel/heterogenous configurations.

```bash
Berenger Bramas's avatar
Berenger Bramas committed
480 481 482 483 484 485
export SCALFMM_NB=10000000
export SCALFMM_H=7
export SCALFMM_MIN_BS=100
export SCALFMM_MAX_BS=3000
export SCALFMM_MAX_NB_CPU=24
export SCALFMM_MAX_NB_GPU=4
Berenger Bramas's avatar
Berenger Bramas committed
486 487 488
```

```bash
Berenger Bramas's avatar
Berenger Bramas committed
489 490 491
export STARPU_NCPUS=1
export STARPU_NCUDA=0
export SCALFMM_BS_CPU_SEQ=`$SCALFMM_AB/scalfmmFindBs.sh -nb $SCALFMM_NB -h $SCALFMM_H $SCALFMM_MIN_BS $SCALFMM_MAX_BS | $SCALFMM_AB/scalfmm_extract_key "@BEST BS" `
Berenger Bramas's avatar
Berenger Bramas committed
492 493 494
if [[ `which gnuplot | wc -l` == "1" ]] ;  then
    gnuplot -e "filename='seq-bs-search'" $SCALFMM_AB/scalfmmFindBs.gplot
fi
Berenger Bramas's avatar
Berenger Bramas committed
495

Berenger Bramas's avatar
Berenger Bramas committed
496 497 498
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=0
export SCALFMM_BS_CPU_PAR=`$SCALFMM_AB/scalfmmFindBs.sh -nb $SCALFMM_NB -h $SCALFMM_H $SCALFMM_MIN_BS $SCALFMM_MAX_BS | $SCALFMM_AB/scalfmm_extract_key "@BEST BS" `
Berenger Bramas's avatar
Berenger Bramas committed
499 500 501
if [[ `which gnuplot | wc -l` == "1" ]] ;  then
    gnuplot -e "filename='par-bs-search'" $SCALFMM_AB/scalfmmFindBs.gplot
fi
Berenger Bramas's avatar
Berenger Bramas committed
502

Berenger Bramas's avatar
Berenger Bramas committed
503 504 505
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=$SCALFMM_MAX_NB_GPU
export SCALFMM_BS_CPU_GPU=`$SCALFMM_AB/scalfmmFindBs.sh -nb $SCALFMM_NB -h $SCALFMM_H $SCALFMM_MIN_BS $SCALFMM_MAX_BS | $SCALFMM_AB/scalfmm_extract_key "@BEST BS" `
Berenger Bramas's avatar
Berenger Bramas committed
506 507 508
if [[ `which gnuplot | wc -l` == "1" ]] ;  then
    gnuplot -e "filename='cpugpu-bs-search'" $SCALFMM_AB/scalfmmFindBs.gplot
fi
Berenger Bramas's avatar
Berenger Bramas committed
509 510 511 512
```

Then, we can execute three best configurations, and keep .rec for each of them:
```bash
Berenger Bramas's avatar
Berenger Bramas committed
513 514
export STARPU_NCPUS=1
export STARPU_NCUDA=0
Berenger Bramas's avatar
Berenger Bramas committed
515
./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_CPU_SEQ
Berenger Bramas's avatar
Berenger Bramas committed
516
export SCALFMM_SEQ_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA.rec"
Berenger Bramas's avatar
Berenger Bramas committed
517 518
mv trace.rec $SCALFMM_SEQ_REC

Berenger Bramas's avatar
Berenger Bramas committed
519 520
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=0
Berenger Bramas's avatar
Berenger Bramas committed
521
./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_BS_CPU_PAR
Berenger Bramas's avatar
Berenger Bramas committed
522
export SCALFMM_PAR_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA.rec"
Berenger Bramas's avatar
Berenger Bramas committed
523 524
mv trace.rec $SCALFMM_PAR_REC

Berenger Bramas's avatar
Berenger Bramas committed
525 526
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=$SCALFMM_MAX_NB_GPU
Berenger Bramas's avatar
Berenger Bramas committed
527
./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_BS_CPU_GPU
Berenger Bramas's avatar
Berenger Bramas committed
528
export SCALFMM_PAR_CPU_GPU_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA.rec"
Berenger Bramas's avatar
Berenger Bramas committed
529 530 531 532 533
mv trace.rec $SCALFMM_PAR_CPU_GPU_REC
```

And we also want the GPU tasks only on GPU
```bash
Berenger Bramas's avatar
Berenger Bramas committed
534 535
export STARPU_NCPUS=$SCALFMM_MAX_NB_CPU
export STARPU_NCUDA=$SCALFMM_MAX_NB_GPU
Berenger Bramas's avatar
Berenger Bramas committed
536
./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_BS_CPU_GPU -p2p-m2l-cuda-only
Berenger Bramas's avatar
Berenger Bramas committed
537
export SCALFMM_PAR_GPU_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA-GPUONLY.rec"
Berenger Bramas's avatar
Berenger Bramas committed
538 539 540 541 542
mv trace.rec $SCALFMM_PAR_GPU_REC
```

And we want the sequential version with parallel granularity:
```bash
Berenger Bramas's avatar
Berenger Bramas committed
543 544
export STARPU_NCPUS=1
export STARPU_NCUDA=0
Berenger Bramas's avatar
Berenger Bramas committed
545 546 547

./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_BS_CPU_PAR
SCALFMM_SEQ_CPU_BS_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA.rec"
Berenger Bramas's avatar
Berenger Bramas committed
548
mv trace.rec $SCALFMM_SEQ_CPU_BS_REC
Berenger Bramas's avatar
Berenger Bramas committed
549 550 551 552 553 554 555 556 557 558

./Tests/Release/testBlockedUnifCudaBench -nb $SCALFMM_NB -h $SCALFMM_H -bs $SCALFMM_BS_CPU_GPU
SCALFMM_SEQ_GPU_BS_REC="trace-nb_$SCALFMM_NB-h_$SCALFMM_H-bs_$SCALFMM_CPU_SEQ-CPU_$STARPU_NCPUS-GPU_$STARPU_NCUDA.rec"
mv trace.rec $SCALFMM_SEQ_GPU_BS_REC
```

From these files, we are able to get the different efficencies.

## Post-processing and Plot

Berenger Bramas's avatar
Berenger Bramas committed
559 560 561 562 563 564 565 566 567
From the file:

+ `$SCALFMM_SEQ_REC` : the resulting file from the sequential execution with best sequential granularity
+ `$SCALFMM_PAR_REC` : the resulting file from a parallel execution (no GPU) with best parallel granularity
+ `$SCALFMM_PAR_CPU_GPU_REC` : the resulting file from a parallel execution (hybrid) with best parallel-hybrid granularity
+ `$SCALFMM_PAR_GPU_REC` : the resulting file with all possible tasks on GPU with best parallel-hybrid granularity
+ `$SCALFMM_SEQ_CPU_BS_REC` : the resulting file from sequential execution with best parallel granularity
+ `$SCALFMM_SEQ_GPU_BS_REC` : the resulting file from sequential execution with best parallel-hybrid granularity

Berenger Bramas's avatar
Berenger Bramas committed
568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585
Getting all the efficency
Solving the linear programming problem

Plotting the results


## Automatization

```bash
SCALFMM_NB=10000000
SCALFMM_H=7
SCALFMM_MIN_BS=100
SCALFMM_MAX_BS=3000
SCALFMM_MAX_NB_CPU=24
SCALFMM_MAX_NB_GPU=4

scalfmm_generate_efficiency -nb $SCALFMM_NB -h $SCALFMM_H -start $SCALFMM_MIN_BS -end $SCALFMM_MAX_BS
```