Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the subtitles. #66

Open
Yxxxb opened this issue Feb 27, 2024 · 1 comment
Open

Questions about the subtitles. #66

Yxxxb opened this issue Feb 27, 2024 · 1 comment

Comments

@Yxxxb
Copy link

Yxxxb commented Feb 27, 2024

Thank you for your great work.

Regarding adding subtitles, I still have the following questions:

  1. If you do not use subtitles for training, and do not change other model architecture and designs, in other words, for video tokens, only the sequence of <image_i> is used. Can the model understand long videos? Or does the model have the ability to find a needle in a haystack or answer detailed questions about a hour long video?
  2. If subtitles are not added, there is obviously an order of magnitude difference between the number of input visual tokens and the number of text tokens. Will such an imbalance affect the effect of the model?
  3. Afte adding subtitles for training, can you infer videos without subtitles? If so, how to inference? How to set up the subtitles?

Thanks.

@Yxxxb
Copy link
Author

Yxxxb commented Mar 5, 2024

@yanwei-li

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant