Closed
Milestone
SP15-Item01: model testing during training
Milestone ID: 2769
As a researcher, i want to perform model testing during an experimentation (training):
Test datasets can be :
- researcher side testing dataset, it is permanently 100% dedicated to testing
- node side testing+training dataset, dynamically split dataset at each researcher request (random samples selected for testing), researcher specifies experiment ratio for testing/training samples in this dataset.
- node side testing dataset is a specific case with a 100% ratio for this dataset
Workflow :
- the round is started: each node performs a random split between training and testing data according to the ratio established. The performance is first assessed with the aggregated parameters (initialization, round N-1 params) on the testing split . The performance is then assessed with the optimized parameters (round N params) on the same testing split
- parameters are aggregated by the researcher, which also receives all the metrics
- eventually the researcher assess the performance on its own testing data (round N params)
- the new aggregated model is sent to the nodes for a new round (go to point 1)
Note: the split of a dataset between training/testing data is done at each round. So a sample may belong to training dataset during a round, and to testing dataset during the next round.
Terminology:
- validation: process of giving a heuristic information on the accuracy of a model during training. May use samples that are also used for training at some point. Covered by this user story.
- testing: process of assessing the accuracy of a model after training, on holdout samples than the one that were used for training. Not part of that user story, may be implemented at a later point (eg: #228)
Note: updated 2022-06-16, we swapped the definitions of testing <=> validation
[replaces #184 (closed)]