Questions about Text Decoder and Text Query #80

SeuXiao · 2024-04-02T07:53:31Z

Thank you very much for providing the code for experience.
I have questions about Text Decoder and Text Query.
You mentioned in the article that Text Decoder is implemented using Q-Former, but as far as I know Q-Former is used to encode image features and can be used to align images with text.
At the same time, you also mentioned in your paper “In this way, the text query Qt contains highlighted visual cues that are most related to the user instruction.” .
I would like to ask, are the features extracted by the Text Query you proposed and the original Q-Former based on text instructions the same? Also, can you provide relevant code to reproduce the results in Figure 6(High response areas with top scores to input question in Equation 1.)?
Looking forward to your reply! Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Text Decoder and Text Query #80

Questions about Text Decoder and Text Query #80

SeuXiao commented Apr 2, 2024

Questions about Text Decoder and Text Query #80

Questions about Text Decoder and Text Query #80

Comments

SeuXiao commented Apr 2, 2024