Melissa merge requestshttps://gitlab.inria.fr/melissa/melissa/-/merge_requests2024-03-11T11:13:48+01:00https://gitlab.inria.fr/melissa/melissa/-/merge_requests/139fix: chmod configuration2024-03-11T11:13:48+01:00Fernando Ayatsfernando.ayats-llamas@inria.frfix: chmod configurationWhen the configuration is installed with guix, it will have 444 perms.When the configuration is installed with guix, it will have 444 perms.https://gitlab.inria.fr/melissa/melissa/-/merge_requests/128Use ADIOS2 I/O framework2024-03-27T11:24:08+01:00SCHOULER MarcUse ADIOS2 I/O frameworkThis MR investigates [ADIOS2](https://adios2.readthedocs.io/en/latest/introduction/introduction.html) as the server/client communication tool with the final objective of replacing the original Melissa-API.
See [the branch README](https:...This MR investigates [ADIOS2](https://adios2.readthedocs.io/en/latest/introduction/introduction.html) as the server/client communication tool with the final objective of replacing the original Melissa-API.
See [the branch README](https://gitlab.inria.fr/melissa/melissa/-/tree/adios-comm?ref_type=heads#melissa) for a detailed discussion of what was done and what is left to do.
To do:
- [x] Make toy example to demonstrate feasibility and better understand adios2 internals
- [x] Isolate/document adios2 installation
- [x] Build/test heatpde with adios2
- [x] Build RoundRobin methodology
- [x] Build AllToAll methodology
- [x] Test RoundRobin in Melissa
- [x] Test AllToAll in Melissa
- [x] Remove collective communications from `get_data()` and engine openings
- [x] Test joblimit, ensure that new engines are opened as they become available
- [x] implement deep learning heat-pde version (with RoundRobin)
- [x] Add adios2 install to gitlab CI
- [x] Rewrite API to be closer to Adios (melissa_define_var, melissa_begin_step, melissa_put etc)
- [x] Ensure termination pattern is precisely how we want it. We have now converted both SA and DL to use the same termination pattern based on the existence of files (as opposed to counting timesteps in the old ZMQ version).
- [x] integrate to CI/Convert tests to conform to new API methods
- [x] Test using multiple fields on client/server
- [x] Refactor methods to ensure generalized machinery is in base_server.py, and specific machinery is in dl/sa server files.
- [x] ~~Ensure `receive()` is shared between SA and DL. Add a `child_data_handle()` to compute stats vs put to buffer.~~ These two servers require fully separated receive() functions due to finalization signals as well as message types expected for buffer vs compute_stats.
- [x] Ensure fault tolerance/checkpointing is working. Issues encountered with the inability to pickle the adios2 engine object. Opened an issue at adios https://github.com/ornladios/ADIOS2/issues/3808
- [ ] Ensure client failure will re-open Engine status on server side after relaunch
- [ ] Have launcher clean up the client engine file if it detects a failure?
- [ ] Check with ADIOS2 that calling engine.Close() waits for readers to extract all remaining timesteps
- [x] Check statistics remain the same before and after ZMQ->Adios2. Results are the same for non-sobol. Sobol still needs to be implenented for Adios2 version.
- [x] wrap adios2 api in an easy melissa api call for ~~C~~ done, ~~Python~~ done, and ~~Fortran~~ done (for 1d vectors only).
- [x] ~~Convert Lorenz to use new python adios2 api~~ Lorenz is now working with Adios python lib directly.
- [x] Create `setup.sh` optional script that can be `wget` to install adios and melissa
- [x] Updating documentation website.
- [x] Install on Jean-Zay and run Heatpde (/gpfswork/rech/igf/commun/uhd97cp/melissa_adios2/melissa/examples/heat-pde/heat-pde-dl)
- [ ] test adios2 on Jean-Zay with infiniband
- [ ] Ensure ZMQ is completely removed from project?
- [ ] Replicate supercomputing
- [ ] Add server side sobol
- [ ] Update code saturne melissa writer
- [x] Allow typing to be set on the client side to ensure best performance
- [x] Compile Adios2 in the virtual cluster so that virtual cluster CI stage works https://gitlab.inria.fr/melissa/virtual-cluster-ci/. We now have a fully green CI with Adios, meaning nearly all functionality (except sobol) is replicated between ZMQ and Adios2 versions.
- [ ] ADIOS2 configurations options to provide
[This repo](https://gitlab.inria.fr/mschoule/adios2-melissa-simple-demo) implements a simplified Melissa/Adios2 system motivated by our exchanges with adios2 developers (see [this github discussion](https://github.com/ornladios/ADIOS2/discussions/3675)).
**Useful links**:
- [ADIOS2 documentation](https://adios2.readthedocs.io/en/latest/advice/advice.html)
- [ADIOS2 tutorial](https://github.com/omlins/adios2-tutorial)
- [Melissa refactoring snippet](https://gitlab.inria.fr/-/snippets/738)
- [Preliminary server/client tests Issue](https://gitlab.inria.fr/melissa/melissa-sa/-/issues/135)
- [Preliminary server/client tests Repo](https://gitlab.inria.fr/cconrads/adios2-client-server-demo)
- [Reader issue regarding over Infiniband Grid5000](https://github.com/ornladios/ADIOS2/issues/4100)