To-do list for new combined melissa repo
The new combined server + launcher + DL + SA is operational but there remain a variety of important components that need to be completed before it can fully replace the original melissa repo:
Melissa-server
-
group support (pull from master) - see !6 -
fault tolerance (timeout monitor from master) - see !5 -
add and test learning == 1 (from develop) -
add checkpoint and restart (from develop) -
add HeatPDE example for SA -
add HeatPDE example for DL (at scale) -
logger cleaning/fixing -
create experimental design class (e.g. draw_param_set()
) - see openturns see https://gitlab.inria.fr/melissa/melissa-combined/-/merge_requests/27 -
tensorflow support (from master)
Melissa-launcher
-
fault tolerance switch (from master) - see !5 -
oar-regale branch (pull from OARREGALE) and incorporate as scheduler class - see !5 -
group support (from master) - see !6 -
consider changing input to a single config file
Docs
-
add docs (from deepmelissa develop) -
rewrite new install and execution directions (and update README) -
add doc for explaining class inheritance system (user facing) -
get free readthedocs.io domain name to host docs on web? - see Batsimuse gitlab pages to host doc -
connect CI to domain server? -
explore adding a user interaction forum/discourse -
New PR ensuring grammar/continuity/consistency
Tests/CI
-
build basic CI -
fix and expand on tests from melissa-launcher - see coverage module -
fix and expand on existing CI -
Add coverage report to CI -
Force the build to merge current branch into develop before running pipeline -
move the scheduler and utility tests to the tests/
folder -
Ensure that failed integration tests also fail pipeline (we need an failed exit signal from launcher) -
Separate integration test stages to avoid the runner trying to execute everything in parallel (slows down pipeline too much). -
Add coverage report to doc website -
Build new docker image on runner that already includes full deep learning and developer dependencies (pipeline is slowed down by downloading torch 4 separate times, 800 mb each time).
Install methods
-
explore changing install method: pip exposed to user, cmake used in the background ? -
update and publish updated Spack package for melissa - see melissa spack
General cleaning
-
clean unnecessary left over code from merge (e.g. .gitignore, cmake need to be cleaned) -
take some time for entire team to go through and Flake8 -
take some time for entire team to go through mypy -
take some time for team to clean the code together -
Use rapidjson so that we can add comments to the config files -
Add a new config section that allows users to add as many executable arguments as they please. -
Remove the need for client/server.sh
Edited by SCHOULER Marc