LLava Models on PDF #2265

MichaelFomenko · 2024-05-14T15:05:39Z

MichaelFomenko
May 14, 2024

I have a Quaestiones about PDF and Dokuments that Contains Images. If I use a Vision Model like LLava like Models, why does it not understand the Pictures contained in PDF and Dokuments? Is it possible to enable LLava models to use the Images in PDFs or Dokuments? Or why do we need any LLava models if we can use simply an AI Model that just describe the Image to Text and use this Text like an Text Dokument? This would make all LLava Models irrelevant. We can use for Example the GIT (Generative Image-to-text Transformer) from Microsoft.

And what about the Microsoft AI Tools like:

Table Transformer: https://huggingface.co/collections/microsoft/table-transformer-6564528e330b667bb267502e
LayoutLM: https://huggingface.co/collections/microsoft/layoutlm-6564539601de72cb631d0902
TAPEX: https://huggingface.co/collections/microsoft/tapex-65371ec9fb6eb06b92b25c04
UDOP: https://huggingface.co/collections/microsoft/udop-65e625124aee97415b88b513
GIT (Generative Image-to-text Transformer): https://huggingface.co/collections/microsoft/git-6601c19e9a0401ea1f8ab8c1

Local Speech Model:

SpeechT5: https://huggingface.co/collections/microsoft/speecht5-650995fc647a3ea442cc6c7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLava Models on PDF #2265

{{title}}

Replies: 0 comments

Select a reply

LLava Models on PDF #2265

MichaelFomenko May 14, 2024

Replies: 0 comments

MichaelFomenko
May 14, 2024