Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clip model size is too small #114

Open
dwsmart32 opened this issue Apr 27, 2024 · 3 comments
Open

Clip model size is too small #114

dwsmart32 opened this issue Apr 27, 2024 · 3 comments

Comments

@dwsmart32
Copy link

dwsmart32 commented Apr 27, 2024

Hello, really appreciate for your great work.
https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md
I checked that you guys wrote "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only preserving video and text encoders and contrastive loss." in your paper.

But I found out that this model [InternVideo2-CLIP-1B-224p-f8] in huggingface is too small, like just a few MB. And according to the right before issue I noticed that that pth file in huggingface is "add on parameter", not a full parameter.

  1. So as i understood, there might be only clip model that post trained after stage2 right?
  2. I want to know how can I initialize that clip model and utilize. I want to get clip score from that model. It would be really grateful if you let me know exact way to do that. ( It is quite confusing no matter how much time i refer to your readme and demo.ipynb file.)

Thank you in advance.

@Andy1621
Copy link
Collaborator

Andy1621 commented Apr 28, 2024

For your question, we only finetuned the AttentionPool in the vision encoder for CLIP model. And the main parameters are not updated.

Please check the zero-shot evaluation code for CLIP to load the model. Here are the scripts.

@dwsmart32
Copy link
Author

Thanks for your reply. Then you mean I can use clip when at least two components get ready which are Internvideo2-s2 parameter(main parameter which has not been updated) and Internvideo2-clip(additional small parameter), right?

It would be really grateful if you let me know when are you guys going to update main parameter approximately.

I m looking forward to utilize your model to my work.

Appreciate for your great work once again. @Andy1621

@Andy1621
Copy link
Collaborator

Yes! Currently, we do not plan to update the main parameter, as I have tried to updated more parameters, but it lead to poorer performance, which may be caused by limited post-training datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants