Mismatch between parameters saved by job and loaded by node
I keep getting the following errors
'TorchModel.set_weights' received inputs that did not cover allmodel parameters; missing weights: ['unet.model.0.conv.unit0.adn.N.running_mean', ...
A little bit of investigation has led me to the following point
- the
model.get_weights
function callsnamed_parameters
- but the
model.set_weights
function usesload_state_dict
Indeed, those missing weights in the error message are present in the state dict, but are not named parameters. Furthermore, using state_dict.items()
instead of named_parameters
in get_weights
solves the issue.