Mentions légales du service

Skip to content

Continuing with the Adios2 v2.10+

PURANDARE Abhishek requested to merge adios2-newapi into adios-comm
  • Rewriting with new API calls
  • Primitive active sampling with number of rounds > 1
  • Possible server issue with server ranks > 1. No apparent error shown but the server times out and gets disconnected from the launcher side. Seems like a very low-level issue. The issue comes and goes with no particular reason. I have recreated the bug by modifying existing Adios2 repo examples. Check this repo: https://gitlab.inria.fr/melissa/adios2-bug.git
  • Sobol results wrong after fixing the above error (Sobol cache needs to stacked by sorted simulation ids order)
  • DL Study requires training to be done from the master thread and if the reception is not done from the master thread, the blocking state will occur. Dilemma! The issue likely seems to be coming from UCX or some other low-level library. It seems to be working inside guix profile. Node must have Infiniband otherwise UCX mostly fail to run. There should be a mechanism to detect underlying the hardware and then select the DataTransport for SST. This choice must be hidden from the user
  • Slurm semiglobal run confirmation.
  • Add remaining CI stages: Slurm global large run, TF lorenz PDE, Coverage report
Edited by PURANDARE Abhishek

Merge request reports

Loading