Enhance support for 'tf.IndexedSlices' in 'TensorflowVector'.
This MR is designed to close issue #17 (closed), by extending the backend of TensorflowVector
to preserve tf.IndexedSlices
structures when conducting operations that are mathematically compatible with their sparsity.
As a consequence, in most cases, tf.IndexedSlices
gradients will result in tf.IndexedSlices
updates, avoiding unrequired memory use and runtime costs for zero-valued rows. One exception is when such a structure is combined with a dense tensor matching their dense shape: then, they are densified, with a warning. The latter is silenced in the one context where this is currently used: as part of NoiseModule.run
, i.e. when adding some random noise to the gradients (which includes its non-zero values), typically as part of the DP-SGD algorithm.
One question that remains open is how loss-regularization term should factor for the fact that some weights are unused as part of a given SGD step. At the moment, Regularizer
algorithms compute gradient corrections based on the full-rank model weights, so that tf.IndexedSlices
gradients will be densified (with a warning). It might be interesting to figure out whether this makes sense, or if some weights should be ignored as part of these computations - but this may for the moment be left as a reflection for end-users, that are supposed to be responsible of the validity of the maths at work in their models and federated optimization algorithm choices.
This MR:
- Modifies the backend of
TensorflowVector
, notably using the newly-introducedadd_indexed_slices_support
function exposed underdeclearn.model.tensorflow.utils
. - Modifies the existing unit tests to have random-valued tensorflow gradients include some (non-sparse)
tf.IndexedSlices
structures together with the classicaltf.Tensor
ones.
Closes #17 (closed)