Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: Integration of Vllm as model server #1153

Open
flefevre opened this issue Apr 20, 2024 · 10 comments
Open

[FEAT]: Integration of Vllm as model server #1153

flefevre opened this issue Apr 20, 2024 · 10 comments
Labels
enhancement New feature or request feature request Integration Request Request for support of a new LLM, Embedder, or Vector database

Comments

@flefevre
Copy link

What would you like to see?

it would be great to be able to configure AnythingLLM with a Vllm model
https://github.com/vllm-project/vllm

@flefevre flefevre added enhancement New feature or request feature request labels Apr 20, 2024
@timothycarambat timothycarambat added the Integration Request Request for support of a new LLM, Embedder, or Vector database label Apr 21, 2024
@mkhludnev
Copy link

I were able to use vllm selecting Local AI in AnythingLLM LLM Settings.
Enjoy.

@flefevre
Copy link
Author

flefevre commented Apr 26, 2024 via email

@flefevre
Copy link
Author

If it could help you in your analysis:
it seems to ask for
https://vllm-mixtral.myserver.fr/v1/chat/completions

but when i go this url i got:
{"detail":"Method Not Allowed"}

it is a problem of api between vllm and anythingllm ?

@timothycarambat
Copy link
Member

That endpoint is POST only, not GET - which is part of the reason you got method not allowed when going to the URL directly.

@alceausu
Copy link

I have the same issue, no mather the integration (Local AI or Generic OpenAI).
The vllm server replies with:
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
INFO:` "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

@lenartgolob
Copy link

What was the solution? Did you manage to integrate vLLM?

@flefevre
Copy link
Author

Dear all,
I have tested again and no way to use vllm directly even if I have exposed the port through the docker configuration.
Could you share exactly how did you do it?
Thanks again

@mkhludnev
Copy link

@flefevre once again, I did it twice it works. Maybe it's a container connectivity issue? I remember that I use curl to check connectivity between containers. Can you tell how your containers,hosts and processes are aligned?
The simplest approach for me were launch vllm and AnythingLLM as a sibling containers under the single docker-compose.yml config and point anithingllm to vllm via container name.
Another approach was to run vllm in a host, then launch AnythingLLM via docker and point to vllm via host.docker.internal but iirc is a Docker Desktop only feature.

@alceausu
Copy link

It seems these are two different issues, one related to connectivity, and the other one on format.
Related to the request format, the anything-llm can reach vllm, but vllm throws an error of '400 Bad request'.
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
For some hints on why, see vllm discussion Mixtral instruct doesn't accept system prompt
Is there a way to modify the template on the anything-llm side?

@flefevre
Copy link
Author

flefevre commented May 1, 2024

Dear all,
Normally i have simplified the test.

Docker configuration

  • Anythingllm a docker compose with the same network
  • Vllm a docker compose with the same network

Docker validation

When I connect to the anythingllm container, i am able to retrieve model of vllm through the command:

anythingllm@6de6c5255f33:~$ curl http://vllm-mixtral:5002/v1/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "prompt": "San Francisco is a", "max_tokens": 7,"temperature": 0}' {"id":"cmpl-0df1e0e95b4c46a78632936ba277e3ef","object":"text_completion","created":1714551853,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"text":" city that is known for its steep","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}anythingllm@6de6c5255f33:~$

Anythingllm Webui configuration
I am able to configure the default LLM preference by setting

Anythingllm Webui Test
When i create a workspace, open a new thread, and ask something, I got everytime a
Could not respond to message. Request failed with status code 400

When i look at the Anythingllm logs, i have the following trace

httpVersionMajor: 1, httpVersionMinor: 1, httpVersion: '1.1', complete: true, rawHeaders: [Array], rawTrailers: [], joinDuplicateHeaders: undefined, aborted: false, upgrade: false, url: '', method: null, statusCode: 400, statusMessage: 'Bad Request', client: [Socket], _consuming: false, _dumped: false, req: [ClientRequest], responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions', redirects: [], [Symbol(kCapture)]: false, [Symbol(kHeaders)]: [Object], [Symbol(kHeadersCount)]: 10, [Symbol(kTrailers)]: null, [Symbol(kTrailersCount)]: 0 } }, isAxiosError: true, toJSON: [Function: toJSON]

Analysis
I agree with @alceausu , the problem seems not to come from a mis configuration of docker / vllm / anythingllm.
It seems more related to a misconfiguration between anlythingllm and vllm in the usage of a specific model which in my case mixtral8x7b.
A solution could be to be able to understand the specificity of each vllm/model system prompt, as proposed by @alceausu vllm-project/vllm#2112
Or perhaps to use a model proxy such as Litellm that will encapsulate the model interaction based on an uniform api which is inspired by openai.

I have created the following Feature proposal here #1154 , I do think it is the good solution.
Do you agree?

If yes, my ticket should perhaps invalidated since Anythingllm is compatible with vllm but not with all models served by vllm.
Mixtral8x7b is really a good model. It will be perfect to access to it through a proxy such as Litellm, ensuring the developer of Anythingllm do not have to adapt all their backend solution for prompting for each model.

Thanks for for your expertise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request Integration Request Request for support of a new LLM, Embedder, or Vector database
Projects
None yet
Development

No branches or pull requests

5 participants