Add a Scheduler API to enable time-based learning rate (and weight decay) adjustements
This MR adds a long-awaited feature: scheduling the learning rate (and/or weight decay factor) to take different values through time, based on the number of training steps and/or rounds already taken.
This takes the form of a new (and extensible) Scheduler
API, implemented under the new declearn.optimizer.schedulers
submodule. Instances of Scheduler
subclasses (or their JSON-serializable specs) may be passed to Optimizer.__init__
instead of float values to specify the lrate
and/or w_decay
parameters, resulting in time-varying values being computed and used rather than a constant one.
This MR implements the base API, integrates it to the Optimizer
one, updates and add dedicated unit tests, and implements a number of standard scheduling rules. The latter include:
- Various kinds of decay (step, multi-steps or round based; linear, exponential, polynomial...)
- Cyclic learning rates (based on this or that paper).
- Linear warmup (steps or round based; combinable with another scheduler to use after the warmup period)
Closes #5 (closed)