V0 - Perfs improvement

This post keeps track of the changes to improve the performance.

The given results are obtained on the plafrim cluster.

Config:

  int thedeg = 4;
  int theraf = 5;

The given duration are obtain without including the compilation of the opencl kernels, see #26bc8834)

Node 4 × K40 GPUs:

Before: Temps total (no memory transfer) =20.000000
Now: Temps total (no memory transfer) =12.500000

Node 2 × P100 GPUs:

Before : Temps total (no memory transfer) =23.500000
Now: Temps total (no memory transfer) =14.000000

This provides a nice speed up, here is a liste of changes that have been applied.

Origin state #8bec277e

There are synchronizations and lots of red part on the GPUs

Deleting the synchronizations #2513cc08

It looks like in function RK2_SPU there is a starpu_task_wait_for_all() inside the while loop. Therefore, I moved the wait juste after the loop by considering that StarPU will manage the dependencies correctly. Need confirmation to know if this is correct and if there is no side effect

COMMUTE and degree of parallelism #cfb8f159

The commute is used in several codelete, and that is really great! However, StarPU is not clear about how the commutative dependencies are managed. And I know (because I partially implemented it) that we need to use an arbiter (a mutex) to have a real commute, because StarPU need to have some kind of global lock to make sure that it can select the right task.

Scheduling

I connected my scheduler Heteroprio by using the version that is already inside StarPU #6d82d899 Then, I connected my WIP scheduler laHeteroprio #07b93c6d And that seems to give very nice results.

Config problem

I cannot use

  int thedeg = 5;
  int theraf = 6;

Or I get

testlaura_spu: /projets/schnaps/schnaps/src/interpolation.c:277: ref_ipg: Assertion `ic[2] >=0 && ic[2]<nraf[2]' failed

What's next?

Use larger thedeg and theraf
Use multiple runner on GPUs (but maybe this only works with CUDA and not OpenCL)

Edited Dec 19, 2018 by BRAMAS Berenger

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message