Mentions légales du service

Skip to content

Implement 'L2GlobalClipping' OptiModule.

ANDREY Paul requested to merge global-clipping into develop

This MR adds L2GlobalClipping, a new OptiModule that implements global-norm-based L2 clipping of batch-averaged gradients. It also revises the documentation of the already-existing L2Clipping module.

The rationale behind introducing this new module is that when addressing exploding gradients, one may prefer using a global norm value rather than a per-parameter one, so as to avoid unbalancing the gradients, by rescaling them all by the same factor. In TensorFlow, both solutions are easy to implement, using either tf.clip_by_norm or tf.clip_by_global_norm; in Torch however, only global-norm clipping is pushed forward, with torch.nn.utils.clip_grad_norm_.

My first approach was to add a new bool parameter to L2Clipping, that would result in using either algorithm; however, this might hurt readability, and cause wrong behaviors when using verious versions of declearn together. With the current approach, L2Clipping remains the same across versions, while trying to use L2GlobalClipping with clients that use older versions of declearn will fail, encouraging users to upgrade to a newer version.

Merge request reports