Sync trainer state with evaluators #2733

vfdev-5 · 2022-10-06T12:00:30Z

🚀 Feature

There can be use-cases when we would like to get trainer's epoch/iteration or/and other items from trainer.state. Let's propose an API such that we could get easily trainer's state from evaluator.

Context : https://discuss.pytorch.org/t/get-current-epoch-inside-process-function-of-evaluator/162926

The text was updated successfully, but these errors were encountered:

louis-she · 2022-10-06T14:10:37Z

Many handlers/metrics provide a global_step_transform as an argument to get the steps it wants.

jalajk24 · 2023-01-29T21:36:44Z

Can I work on this? I am pretty new to this

vfdev-5 · 2023-01-30T08:03:43Z

@jalajk24 right now it is still under discussions whether we need to work on something here. Do you have any ideas or suggestions on the topic ?

guptaaryan16 · 2023-02-18T05:30:25Z

I am proposing a new API function for Engine class that can fetch the epoch from an instance of trainer.
It can work in this way. This can also return the current trainer epoch

def fetch_trainer_epoch(trainer: Engine):
      epoch = trainer.state.epoch
      self.state.trainer_epoch = epoch
      return epoch

@vfdev-5 does this makes sense?

It can be called like optimizer.step()

louis-she · 2023-02-18T06:57:10Z

The core question of the issue is whether to abstract a trainer in ignite. It's not a good idea from what I know of ignite, or at least the core of it.

guptaaryan16 · 2023-02-18T10:25:43Z

Hey @louis-she ,I guess the API can be helpful to compare the performances of two or more different training methods, also it can help in training of ensemble models. I have been working in the space of the GANs and adversarial training and I have noticed that sometimes you need to combine two training methods to get better results, so this may be a helpful addition in the Engine class

vfdev-5 · 2023-02-18T10:34:51Z

@guptaaryan16 can you please give a concrete example of what you are talking about ?

guptaaryan16 · 2023-02-18T10:52:39Z

Sure @vfdev-5 , I think it will be mostly useful for hyperparameter tuning and testing of variation of results to make the training easier; like reducing the number of epochs and testing the different training methods.

For instance, I can share a small thing happened when I was training a model using Cifar-10 and Gaussian Augmentation training(https://arxiv.org/abs/1902.02918) to measure the Average Certified Radius(ACR) of the model using Randomized smoothing. There I noticed that if I included a PGD adversarial training(https://arxiv.org/pdf/1706.06083.pdf) in addition to the Gaussian Augmentation training I can get a very high ACR, but to get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results. So it may be helpful to have this API but you can also get the specific epoch without having this .

vfdev-5 · 2023-02-18T13:22:23Z

@guptaaryan16 thanks for details but I was wondering more about code details. Can you provide some code to highlight your idea. As for HP tuning and multiple experiments, you can check

HP tuning tutorial: https://github.com/pytorch/ignite/blob/master/examples/notebooks/Cifar10_Ax_hyperparam_tuning.ipynb
Experiment tracking e.g. with ClearML: https://pytorch-ignite.ai/how-to-guides/10-loggers/

get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results.

I think there is nothing impossible here. I imagine that you have a handler to run validation:

best_acr = 0.0

def run_validation():
    evaluator.run(val_data)
    metrics = evaluator.state.metrics
    if metrics["ACR"] > best_acr:
        best_acr = metrics["ACR"]
        current_epoch = trainer.state.epoch
        # save locally a bundle:
        fp = f"/path/to/output/{current_epoch}_best_acr.pt"
        torch.save({
            "best_acr": best_acr,
            "epoch": current_epoch,
            "model": model.state_dict(),
            ...
        })

guptaaryan16 · 2023-02-18T17:13:59Z

yes @vfdev-5 I do not have the specific code for that but I can imagine that it was written along the same lines(that project did not use ignite )
Also I was thinking about can we access the epochs directly instead using the trainer.state.epoch to trainer.epoch as it can make a bit more sense because I don't think we can have different states within the same trainer anyways

vfdev-5 added enhancement help wanted Hacktoberfest needs-discussion labels Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync trainer state with evaluators #2733

Sync trainer state with evaluators #2733

vfdev-5 commented Oct 6, 2022

louis-she commented Oct 6, 2022

jalajk24 commented Jan 29, 2023

vfdev-5 commented Jan 30, 2023

guptaaryan16 commented Feb 18, 2023

louis-she commented Feb 18, 2023

guptaaryan16 commented Feb 18, 2023

vfdev-5 commented Feb 18, 2023

guptaaryan16 commented Feb 18, 2023 •

edited

vfdev-5 commented Feb 18, 2023 •

edited

guptaaryan16 commented Feb 18, 2023 •

edited

Sync trainer state with evaluators #2733

Sync trainer state with evaluators #2733

Comments

vfdev-5 commented Oct 6, 2022

🚀 Feature

louis-she commented Oct 6, 2022

jalajk24 commented Jan 29, 2023

vfdev-5 commented Jan 30, 2023

guptaaryan16 commented Feb 18, 2023

louis-she commented Feb 18, 2023

guptaaryan16 commented Feb 18, 2023

vfdev-5 commented Feb 18, 2023

guptaaryan16 commented Feb 18, 2023 • edited

vfdev-5 commented Feb 18, 2023 • edited

guptaaryan16 commented Feb 18, 2023 • edited

guptaaryan16 commented Feb 18, 2023 •

edited

vfdev-5 commented Feb 18, 2023 •

edited

guptaaryan16 commented Feb 18, 2023 •

edited