Would it be possible to support different models for images vs. text? #34

RustyReich · 2024-04-26T19:47:21Z

I would like to be able to use a vision model only when the message content includes an image, while using a non-vision model for messages which only include text. Would this be possible?

jakobdylanc · 2024-04-26T20:28:07Z

This is a good suggestion. Definitely possible. I'd have to think more on how to implement this as cleanly as possible. I'm thinking a separate "VISION_LLM" config option, in addition to "LLM".

Regarding the functionality, I think the vision model should be used whenever images are present ANYWHERE in the current conversation, not just the latest message.

RustyReich · 2024-04-27T00:39:49Z

Regarding the functionality, I think the vision model should be used whenever images are present ANYWHERE in the current conversation, not just the latest message.

I agree, as you may want to continue a discussion about an image that was sent several messages back in the conversation.

JzJad · 2024-05-12T11:15:00Z

May want to check out this: https://github.com/zhaobenny/bz-cogs/tree/main/aiuser (Uses a different model for vision vs text)
I am using it with ollama for a one off questions bot, as it has issues with following a conversation (Yours does an excellent job btw.)
So making this a cog would be awesome too.

jakobdylanc · 2024-05-20T01:44:53Z

I'm hesitating to add this feature because it sacrifices simplicity for a less common use case.

Why do you want this feature? Do you find vision LLMs to be worse than regular LLMs when it comes to text conversation?

JzJad · 2024-05-20T17:17:19Z

Thats precisely the experience I have ran into at least with the smaller models, granted this mostly just helps when using two smaller llm's verse one much larger then both combined.

jakobdylanc mentioned this issue May 20, 2024

ollama vision #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be possible to support different models for images vs. text? #34

Would it be possible to support different models for images vs. text? #34

RustyReich commented Apr 26, 2024

jakobdylanc commented Apr 26, 2024

RustyReich commented Apr 27, 2024

JzJad commented May 12, 2024

jakobdylanc commented May 20, 2024

JzJad commented May 20, 2024

Would it be possible to support different models for images vs. text? #34

Would it be possible to support different models for images vs. text? #34

Comments

RustyReich commented Apr 26, 2024

jakobdylanc commented Apr 26, 2024

RustyReich commented Apr 27, 2024

JzJad commented May 12, 2024

jakobdylanc commented May 20, 2024

JzJad commented May 20, 2024