[FEAT]: Integration of Vllm as model server #1153

flefevre · 2024-04-20T22:10:00Z

What would you like to see?

it would be great to be able to configure AnythingLLM with a Vllm model
https://github.com/vllm-project/vllm

mkhludnev · 2024-04-24T12:52:08Z

I were able to use vllm selecting Local AI in AnythingLLM LLM Settings.
Enjoy.

flefevre · 2024-04-26T06:20:55Z

Thanks for your advice. I have tested but i failed. I do confirm vllm instance is working fine since https://vllm-mixtral.myserver.fr/v1/models {"object":"list","data":[{"id":"mistralai/Mixtral-8x7B-Instruct-v0.1","object":"model","created":1714112327,"owned_by":"vllm","root":"mistralai/Mixtral-8x7B-Instruct-v0.1","parent":null,"permission":[{"id":"modelperm-76d249bf4f0e44698e3bb82a41424183","object":"model_permission","created":1714112327,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]} i have putted in the config: LocalAI with http://vllm-mistral:5002/v1 Anything is able to retrieve the model. But when I tried to engage chat, i got an error:vCould not respond to message. Request failed with status code 400 Looking to the log of Anythingllm , I ahev the following trace: I would appreciate your help. Thanks in advance. Francois, from France ``` _events: [Object: null prototype], 349 _eventsCount: 1, 350 _maxListeners: undefined, 351 socket: [Socket], 352 httpVersionMajor: 1, 353 httpVersionMinor: 1, 354 httpVersion: '1.1', 355 complete: true, 356 rawHeaders: [Array], 357 rawTrailers: [], 358 joinDuplicateHeaders: undefined, 359 aborted: false, 360 upgrade: false, 361 url: '', 362 method: null, 363 statusCode: 400, 364 statusMessage: 'Bad Request', 365 client: [Socket], 366 _consuming: false, 367 _dumped: false, 368 req: [ClientRequest], 369 responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions', 370 redirects: [], 371 [Symbol(kCapture)]: false, 372 [Symbol(kHeaders)]: [Object], 373 [Symbol(kHeadersCount)]: 10, 374 [Symbol(kTrailers)]: null, 375 [Symbol(kTrailersCount)]: 0 376 } 377 }, 378 isAxiosError: true, 379 toJSON: [Function: toJSON] 380 } ```

…

On Wed, Apr 24, 2024 at 2:52 PM Mikhail Khludnev ***@***.***> wrote: I were able to use vllm selecting Local AI in AnythingLLM LLM Settings. Enjoy. — Reply to this email directly, view it on GitHub <#1153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKZRFGNVGLCW5HBLY7LT2DY66TI7AVCNFSM6AAAAABGQZ5LSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZUHA3TONJRGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

--

flefevre · 2024-04-26T06:28:23Z

If it could help you in your analysis:
it seems to ask for
https://vllm-mixtral.myserver.fr/v1/chat/completions

but when i go this url i got:
{"detail":"Method Not Allowed"}

it is a problem of api between vllm and anythingllm ?

timothycarambat · 2024-04-26T16:22:10Z

That endpoint is POST only, not GET - which is part of the reason you got method not allowed when going to the URL directly.

alceausu · 2024-04-28T14:43:47Z

I have the same issue, no mather the integration (Local AI or Generic OpenAI).
The vllm server replies with:
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
INFO:` "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

lenartgolob · 2024-04-28T18:34:12Z

What was the solution? Did you manage to integrate vLLM?

flefevre · 2024-04-28T20:16:25Z

Dear all,
I have tested again and no way to use vllm directly even if I have exposed the port through the docker configuration.
Could you share exactly how did you do it?
Thanks again

mkhludnev · 2024-04-28T22:07:34Z

@flefevre once again, I did it twice it works. Maybe it's a container connectivity issue? I remember that I use curl to check connectivity between containers. Can you tell how your containers,hosts and processes are aligned?
The simplest approach for me were launch vllm and AnythingLLM as a sibling containers under the single docker-compose.yml config and point anithingllm to vllm via container name.
Another approach was to run vllm in a host, then launch AnythingLLM via docker and point to vllm via host.docker.internal but iirc is a Docker Desktop only feature.

alceausu · 2024-04-29T12:49:54Z

It seems these are two different issues, one related to connectivity, and the other one on format.
Related to the request format, the anything-llm can reach vllm, but vllm throws an error of '400 Bad request'.
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
For some hints on why, see vllm discussion Mixtral instruct doesn't accept system prompt
Is there a way to modify the template on the anything-llm side?

flefevre · 2024-05-01T08:41:21Z

Dear all,
Normally i have simplified the test.

Docker configuration

Anythingllm a docker compose with the same network
Vllm a docker compose with the same network

Docker validation

When I connect to the anythingllm container, i am able to retrieve model of vllm through the command:

anythingllm@6de6c5255f33:~$ curl http://vllm-mixtral:5002/v1/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "prompt": "San Francisco is a", "max_tokens": 7,"temperature": 0}' {"id":"cmpl-0df1e0e95b4c46a78632936ba277e3ef","object":"text_completion","created":1714551853,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"text":" city that is known for its steep","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}anythingllm@6de6c5255f33:~$

Anythingllm Webui configuration
I am able to configure the default LLM preference by setting

Local AI
http://vllm-mixtral:5002 >> it see the model and propose me mistralai/Mixtral-8x7B-Instruct-v0.1

Anythingllm Webui Test
When i create a workspace, open a new thread, and ask something, I got everytime a
Could not respond to message. Request failed with status code 400

When i look at the Anythingllm logs, i have the following trace

httpVersionMajor: 1, httpVersionMinor: 1, httpVersion: '1.1', complete: true, rawHeaders: [Array], rawTrailers: [], joinDuplicateHeaders: undefined, aborted: false, upgrade: false, url: '', method: null, statusCode: 400, statusMessage: 'Bad Request', client: [Socket], _consuming: false, _dumped: false, req: [ClientRequest], responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions', redirects: [], [Symbol(kCapture)]: false, [Symbol(kHeaders)]: [Object], [Symbol(kHeadersCount)]: 10, [Symbol(kTrailers)]: null, [Symbol(kTrailersCount)]: 0 } }, isAxiosError: true, toJSON: [Function: toJSON]

Analysis
I agree with @alceausu , the problem seems not to come from a mis configuration of docker / vllm / anythingllm.
It seems more related to a misconfiguration between anlythingllm and vllm in the usage of a specific model which in my case mixtral8x7b.
A solution could be to be able to understand the specificity of each vllm/model system prompt, as proposed by @alceausu vllm-project/vllm#2112
Or perhaps to use a model proxy such as Litellm that will encapsulate the model interaction based on an uniform api which is inspired by openai.

I have created the following Feature proposal here #1154 , I do think it is the good solution.
Do you agree?

If yes, my ticket should perhaps invalidated since Anythingllm is compatible with vllm but not with all models served by vllm.
Mixtral8x7b is really a good model. It will be perfect to access to it through a proxy such as Litellm, ensuring the developer of Anythingllm do not have to adapt all their backend solution for prompting for each model.

Thanks for for your expertise.

flefevre added enhancement New feature or request feature request labels Apr 20, 2024

timothycarambat added the Integration Request Request for support of a new LLM, Embedder, or Vector database label Apr 21, 2024

flefevre mentioned this issue May 1, 2024

[FEAT]: Integration of Litellm as model proxy such as Ollama #1154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Integration of Vllm as model server #1153

[FEAT]: Integration of Vllm as model server #1153

flefevre commented Apr 20, 2024

mkhludnev commented Apr 24, 2024

flefevre commented Apr 26, 2024 via email •

edited

flefevre commented Apr 26, 2024

timothycarambat commented Apr 26, 2024

alceausu commented Apr 28, 2024

lenartgolob commented Apr 28, 2024

flefevre commented Apr 28, 2024

mkhludnev commented Apr 28, 2024

alceausu commented Apr 29, 2024

flefevre commented May 1, 2024

[FEAT]: Integration of Vllm as model server #1153

[FEAT]: Integration of Vllm as model server #1153

Comments

flefevre commented Apr 20, 2024

What would you like to see?

mkhludnev commented Apr 24, 2024

flefevre commented Apr 26, 2024 via email • edited

flefevre commented Apr 26, 2024

timothycarambat commented Apr 26, 2024

alceausu commented Apr 28, 2024

lenartgolob commented Apr 28, 2024

flefevre commented Apr 28, 2024

mkhludnev commented Apr 28, 2024

alceausu commented Apr 29, 2024

flefevre commented May 1, 2024

flefevre commented Apr 26, 2024 via email •

edited