Fix moondream support #7163

abetlen · 2024-05-09T05:35:37Z

Re-applies changes from add support for moondream vision language model #6899
Fixes inconsistent num_positions
Correclty zeros embeddings tensor when class embeddings are used

Currently works correctly for original Llava models but moondream and other siglip based models exhibit the same issues as reported in #7060 where subsequent requests with new images are broken. I suspect that the issue is with n_img_pos because what I observe is parts of the previous image patches linger in the context (as observed in model responses).

Closes #7060

…el (ggerganov#6899)"" This reverts commit 9da243b.

abetlen · 2024-05-10T05:19:50Z

Update: Tested the current PR with a few different models and it actually fixes the bug in almost all cases.

Test Procedure:

Load first image and ask model to describe the image with temperature 0.
Repeat step 1 and ensure generation is the same.
Load second image and ask model to describe the image with temperature 0.
Repeat step 3 and ensure generation is the same.

Tested and Working

Tested and Not Working

Moondream2: Fails right away at step 2, second generation is different from the first.

What's strange is that the nanoLLaVA and llama-3-vision-alpha image projector ggufs follow the same architecture as moondream2 (in fact I generated them by adapting @vikhyat create_gguf.py file) so not sure why this issue only effects that model.

abetlen · 2024-05-10T06:36:39Z

Update: I'm dumb, so moondream also works correctly with this PR, there was a seperate bug in llama-cpp-python. I was failing to clear the kv cache when the image embedding came before any text.

Tested with llama-cpp-python and llava-cli and it all works on my end.

ggerganov

Thanks for taking a look

abetlen added 2 commits May 9, 2024 01:22

Revert "Revert "llava : add support for moondream vision language mod…

77740fb

…el (ggerganov#6899)"" This reverts commit 9da243b.

Fix num_positions and embeddings initialization

12536fd

mofosyne added bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024

abetlen marked this pull request as ready for review May 10, 2024 06:36

ggerganov approved these changes May 10, 2024

View reviewed changes

ggerganov merged commit d11afd6 into ggerganov:master May 10, 2024
48 checks passed

xBelladonna mentioned this pull request May 10, 2024

llava 1.5 invalid output after first inference (llamacpp server) #7060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix moondream support #7163

Fix moondream support #7163

abetlen commented May 9, 2024

abetlen commented May 10, 2024

abetlen commented May 10, 2024

ggerganov left a comment

Fix moondream support #7163

Fix moondream support #7163

Conversation

abetlen commented May 9, 2024

abetlen commented May 10, 2024

abetlen commented May 10, 2024

ggerganov left a comment

Choose a reason for hiding this comment