[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
language
deep-learning
wikipedia
dataset
vision
alignment
document
llama
arxiv
clip
multimodality
multimodal-deep-learning
vision-transformer
gpt4
clipmodel
-
Updated
Apr 4, 2024 - Python