Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Text Decoder and Text Query #80

Open
SeuXiao opened this issue Apr 2, 2024 · 0 comments
Open

Questions about Text Decoder and Text Query #80

SeuXiao opened this issue Apr 2, 2024 · 0 comments

Comments

@SeuXiao
Copy link

SeuXiao commented Apr 2, 2024

Thank you very much for providing the code for experience.
I have questions about Text Decoder and Text Query.
You mentioned in the article that Text Decoder is implemented using Q-Former, but as far as I know Q-Former is used to encode image features and can be used to align images with text.
At the same time, you also mentioned in your paper “In this way, the text query Qt contains highlighted visual cues that are most related to the user instruction.” .
I would like to ask, are the features extracted by the Text Query you proposed and the original Q-Former based on text instructions the same? Also, can you provide relevant code to reproduce the results in Figure 6(High response areas with top scores to input question in Equation 1.)?
Looking forward to your reply! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant