Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The torchvision pretrained VGG-16 requires normalization of inputs and you do not do this #26

Open
crowsonkb opened this issue Mar 31, 2021 · 3 comments

Comments

@crowsonkb
Copy link

As per the torchvision documentation:

The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

Not doing this will cause VGG-16 to output the wrong feature maps and you will probably get worse results. If you add this transform you will have to retrain though.

@sanchit88
Copy link

Any update on this issue?

@yunxiaoshi
Copy link
Owner

I guess it makes sense to use ImageNet statistics here since AVA and ImageNet doesn't differ much in terms of domain. Did observe some improvement

@crowsonkb
Copy link
Author

crowsonkb commented Jun 9, 2021

You should be using ImageNet statistics for any input because that's what VGG-16 was trained on, you should only use different statistics if you trained or fine-tuned VGG-16 on a dataset where you normalized with those different statistics during training. If you are training a model on VGG's outputs the inputs to VGG still need to use the statistics VGG was trained with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants