Implement learning-rate and weights-decay scheduling
Currently, optimizers only accept static values for the learning rate and (optional) weight decay parameters.
Adaptive optimizer modules (such as AdaGrad, Adam, Yogi, etc.) enable the effective adaptation of the base learning rate throughout time; however, their can still be value in scheduling learning rate evolutions, e.g. to implement a warm-up period at the beginning of training (during which the effective learning rate will most probably increase in spite of its adaptive component when one is used).
Some code had been written as part of the previous declearn version (see this file), however it is unclear whether it was actually workable, and some revision is required at any rate for declearn 2 to make use of it. In addition, providing with an extendable API and/or a wrapper for custom algorithms could be nice to enable users to set up any formula they may wish for - as is the case in frameworks such as TensorFlow or PyTorch.
Tasks:
-
Design a simple API for learning rate (and opt. weight decay) scheduling. -
Implement the base API as well as (combinable?) examples of popular formulas (warmup, linear or polynomial decay, ...). -
Deploy the former to the Optimizer and Strategy APIs. -
Write associated documentation. -
Write associated tests (unit tests for the API and for its integrated use).