Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to support different models for images vs. text? #34

Open
RustyReich opened this issue Apr 26, 2024 · 5 comments
Open

Comments

@RustyReich
Copy link

I would like to be able to use a vision model only when the message content includes an image, while using a non-vision model for messages which only include text. Would this be possible?

@jakobdylanc
Copy link
Owner

This is a good suggestion. Definitely possible. I'd have to think more on how to implement this as cleanly as possible. I'm thinking a separate "VISION_LLM" config option, in addition to "LLM".

Regarding the functionality, I think the vision model should be used whenever images are present ANYWHERE in the current conversation, not just the latest message.

@RustyReich
Copy link
Author

Regarding the functionality, I think the vision model should be used whenever images are present ANYWHERE in the current conversation, not just the latest message.

I agree, as you may want to continue a discussion about an image that was sent several messages back in the conversation.

@JzJad
Copy link

JzJad commented May 12, 2024

May want to check out this: https://github.com/zhaobenny/bz-cogs/tree/main/aiuser (Uses a different model for vision vs text)
I am using it with ollama for a one off questions bot, as it has issues with following a conversation (Yours does an excellent job btw.)
So making this a cog would be awesome too.

@jakobdylanc
Copy link
Owner

I'm hesitating to add this feature because it sacrifices simplicity for a less common use case.

Why do you want this feature? Do you find vision LLMs to be worse than regular LLMs when it comes to text conversation?

@JzJad
Copy link

JzJad commented May 20, 2024

Thats precisely the experience I have ran into at least with the smaller models, granted this mostly just helps when using two smaller llm's verse one much larger then both combined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants