Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix moondream support #7163

Merged
merged 2 commits into from
May 10, 2024
Merged

Fix moondream support #7163

merged 2 commits into from
May 10, 2024

Conversation

abetlen
Copy link
Collaborator

@abetlen abetlen commented May 9, 2024

Currently works correctly for original Llava models but moondream and other siglip based models exhibit the same issues as reported in #7060 where subsequent requests with new images are broken. I suspect that the issue is with n_img_pos because what I observe is parts of the previous image patches linger in the context (as observed in model responses).

Closes #7060

@mofosyne mofosyne added bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024
@abetlen
Copy link
Collaborator Author

abetlen commented May 10, 2024

Update: Tested the current PR with a few different models and it actually fixes the bug in almost all cases.

Test Procedure:

  1. Load first image and ask model to describe the image with temperature 0.
  2. Repeat step 1 and ensure generation is the same.
  3. Load second image and ask model to describe the image with temperature 0.
  4. Repeat step 3 and ensure generation is the same.

Tested and Working

Tested and Not Working

  • Moondream2: Fails right away at step 2, second generation is different from the first.

What's strange is that the nanoLLaVA and llama-3-vision-alpha image projector ggufs follow the same architecture as moondream2 (in fact I generated them by adapting @vikhyat create_gguf.py file) so not sure why this issue only effects that model.

@abetlen
Copy link
Collaborator Author

abetlen commented May 10, 2024

Update: I'm dumb, so moondream also works correctly with this PR, there was a seperate bug in llama-cpp-python. I was failing to clear the kv cache when the image embedding came before any text.

Tested with llama-cpp-python and llava-cli and it all works on my end.

@abetlen abetlen marked this pull request as ready for review May 10, 2024 06:36
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look

@ggerganov ggerganov merged commit d11afd6 into ggerganov:master May 10, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llava 1.5 invalid output after first inference (llamacpp server)
3 participants