Mentions légales du service

Skip to content
Snippets Groups Projects

Fix task pruning on WAR dependencies between nodes

Closed THIBAULT Samuel requested to merge thibault/chameleon:iscached into master

As we discussed about the TPDS paper, we don't have WAR dependencies between nodes yet, but that could happen, we need to check against that too.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • It doesn't show up on the page, the page only tells me about installing a runner, and giving it the following key: zxQNF_NhkGrEe23nh4_p perhaps you just need to add the key to it?

  • I just wanted to comment that it's the same for me. The page does tell something about sharing runners, but it doesn't show any existing ones.

  • I have enabled the runner, thanks!

  • @agullo @thibault I'm not sure we really want to integrate that in chameleon. We know that the algorithms don't have this problem, so why adding extra checks ?

  • But will that always be the case in the future? At least we need a check, enabled in some debugging mode, that this doesn't happen.

  • @thibault Can you give an example of possible errors due to the way WAR are handled please.

  • Assuming that data A has owner node 0, and data B and C have owner node 1:

    starpu_mpi_insert_task(&foo, STARPU_W, A, 0);

    starpu_mpi_insert_task(&bar, STARPU_R, A, STARPU_W, B, 0);

    starpu_mpi_insert_task(&foo, STARPU_W, A, 0);

    starpu_mpi_insert_task(&bar, STARPU_R, A, STARPU_W, C, 0);

    1st and 3rd tasks will thus run on node 0, and 2nd and 4th will run on node 1.

    If the insert_task wrapper for foo only looks at the owner of A, it will prune the 3rd task insertion on node 1. And thus the 4th task insertion on node 1 will think that it still has the right value for A, and not receive it. In the meanwhile, node 0 will send the new value of A (since it knows it is outdated), and thus we have a mismatch.

    Edited by THIBAULT Samuel
  • Now, to make the code less hairy, I was thinking that we could have helpers to write task insertion wrappers this way:

    foo() { BEGIN_ACCESS ACCESS_R(A) ACCESS_W(B) END_ACCESS task_insert(foo, STARPU_R, A, STARPU_W, B, 0); }

    and ACCESS_R/W will do whatever accounting is needed, and END_ACCESS will potentially return if it is known that none of the data is owned by the node, and none of them is cached.

    Edited by THIBAULT Samuel
  • By "accounting", I mean that I was thinking about including task pruning statistics, to count the total number of tasks, the total number of tasks which were inserted for communications reasons, the total number of tasks actually executed, etc.

    1. Regarding your example:

    I am certainly missing the point but isn't the problem more general than WAR? Said differently, is the problem only for WAR or is it potentially the case for WAW too?

    In your example, what happens if the 2nd task is in write mode instead: starpu_mpi_insert_task(&bar, STARPU_W, A, STARPU_W, B, 0)? In this case we can even remove the 1st task. Wouldn't it lead to the same problem? The third task would not be pruned on node 1 as well?

    1. If a helper as you suggest can fit, that would indeed help a lot the readability. If it can furthermore optionally to the stats, it would be excellent.

    2. Overall, although a trivial conceptual problem, I think it is easy to get lost (I am already lost with a 4 tasks example :-/ ). My feeling is that if we want to design advanced numerical algorithms (with bulge-chasing for instance, ...) and/or innovative data mappings, this checks will be a precious time saving.

  • Yes, it's not just WAR, I just mentioned WAR as a typical example. With starpu_mpi_insert_task(&bar, STARPU_W, A, STARPU_W, B, 0), node 1 would assume it has the right value in cache. But anyway, all scenarii would be caught by the generic macro I'm thinking of.

  • THIBAULT Samuel mentioned in merge request !27 (merged)

    mentioned in merge request !27 (merged)

  • THIBAULT Samuel deleted source branch iscached

    deleted source branch iscached

Please register or sign in to reply
Loading