Tensorboard: parallelization happens when training sklearn perceptron on MNIST dataset
A very strange behavior happens when reproducing MNIST example for sklearn Perceptron on tutorial website: when training perceptron, parallelization is triggered, outputing on node:
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers
[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 0.9s finished
but this generate 10 logs (so we cannot use tensorboard to plot loss)
parameters set for perceptron :
model_args = { 'n_jobs':1,
'tol': 1e-4 ,
'eta0':1e-6,
'random_state':1234,
'alpha':0.1,
}
training_args = {
'epochs': 1,
}
A first investigation (using debugger) leads to perceptron.patial_fit
call being the culprit.
My 2 cents: parallelization happens when data is too large to fit in memory, (so sklearn is doing some sort of mini batch, resulting to several logs message for only one epoch)