Explore implementation of `num_updates` in scikit-learn with a one-by-one sample approach
As suggested by @paandrey:
This is indeed a limitation with scikit-learn, that occurs in most cases since most scikit-learn models run are designed to handle the entire training procedure based on a single fit call. This can however be avoided the specific case of SgdClassifier / SgdRegressor classes, because it is possible to set a constant arbitrary learning rate, feed the partial_fit method with a single data sample at a time, and reset its weights after each call. This way, it is possible to compute sample-wise, and therefore batch-wise, gradients. This is what we do in declearn, so that the same optimizer code (and plug-in system) can be used with these models as with torch and tensorflow.