Incorrect use of `optimizer_args` method in TorchTrainingPlan

optmizer_args is researcher/user side API to retrieve optimizer arguments in any method defined in TrainingPlan (including init_optimizer) by researcher. These optimizer arguments are defined on the researcher in experiment.training_args. But with the commit d2beee48 the method optimizer_args() is extended by adding update_optimizer_args method that gets learning rate from the optimizer. This change introduce a potential bug if the researcher tries to get optmizer_args() before Optimizer is defined.

def optimizer_args(self) -> Dict

    """Retrieves optimizer arguments

        Returns:
            Optimizer arguments
   """
        self.update_optimizer_args()  # update `optimizer_args` (eg after training)
        return self._optimizer_args

optimizer_args calls self.update_optimizer_args and update_optimizer_args calls get_learning_rate which raises error if the optimizer is not defined:

    def get_learning_rate(self) -> List[float]:

        if self._optimizer is None:
            raise FedbiomedTrainingPlanError(f"{ErrorNumbers.FB605.value}: Optimizer not found, please call "
                                             f"`init_optimizer beforehand")
        learning_rates = []
        params = self._optimizer.param_groups
        for param in params:
            learning_rates.append(param['lr'])
        return learning_rates

In this case if researhcer try to access optimizer_args before instatiating an optimizer:

class MyTrainingPlan(TorchTrainingPlan):
    def init_model(self):
        opt = self.optimizer_args() # Raises error
        return Baseline()

    def init_optimizer(self):
        optimizer_args = self.optimizer_args() #raises error
        return Optimizer(self.model().parameters(), lr=optimizer_args["lr"])

get_learning_rate is mandatory to use in order the retrieve default learning rate if the researcher does not define it in the optimizer arguments. Even TrainingArgs populates lr with default value, if researcher does not set it as argument of optimizer in def init_optimizer method, the correct learning rate will be accessible from the optimizer itself only. Since this information is critical for scaffold, optimizer arguments should be updated right after optimizer is created and after the training if we consider that some of optimizer arguments can update learning rates during the training.

Here is one of the flow that can be implemented

1 - Don't call update_optimizer_args in optmizer_args to avoid issues if the optimizer is not defined. optmizer_args() should remain as getter where it returns only self._optimizer_args
2 - Always populate lr in TrainingArgs with a default value (optional).
3 - Call update_optimizer_args right after optimizer is created. To make sure self._optimizer_args["lr"] has correct learning rate. This is to make sure that if self.optimizer_args() is used in training_step.
4 - Update learning rate after every iteration (if it is important for training_step method) or after training routine to make sure the lr that will be sent back to researcher will be the correct one.

Another (easy) Solution Do not let researcher to use self.optmizer_args or self.model_arguments, and force researcher to set argument optimizer_args in def init_optimizer(self, optimizer_args)

Edited Jan 09, 2023 by CANSIZ Sergen

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Incorrect use of `optimizer_args` method in TorchTrainingPlan