Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: direct llama.cpp integration #1483

Open
tjbck opened this issue Apr 10, 2024 · 6 comments
Open

feat: direct llama.cpp integration #1483

tjbck opened this issue Apr 10, 2024 · 6 comments
Assignees

Comments

@tjbck
Copy link
Contributor

tjbck commented Apr 10, 2024

No description provided.

@jukofyork
Copy link

jukofyork commented Apr 10, 2024

Just a quick follow-up to say it seems to work fine:

  • I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: ./server --port 8081 ...).
  • Then set the OpenAPI base URL to http://127.0.0.1:8081/v1 and the API Key to not be blank (eg: none) in OpenWebUI settings.

and it seems to be calling the OAI-like API endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the /v1 to the URL and ensure the API Key not be blank though (had to find by trial and error).

The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the OAI-like API endpoint to get these stats:

{
  "tid": "140627543928832",
  "timestamp": 1712766280,
  "level": "INFO",
  "function": "print_timings",
  "line": 313,
  "msg": "prompt eval time     =     129.89 ms /    55 tokens (    2.36 ms per token,   423.43 tokens per second)",
  "id_slot": 0,
  "id_task": 13,
  "t_prompt_processing": 129.892,
  "n_prompt_tokens_processed": 55,
  "t_token": 2.3616727272727274,
  "n_tokens_second": 423.42869460782805
}

I'll report back if I can see any other major differences, but otherwise 👍

@jukofyork
Copy link

I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK:

#1166
#1170

It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.

@jukofyork
Copy link

Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.

@tjbck
Copy link
Contributor Author

tjbck commented Apr 14, 2024

@jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.

@DenisSergeevitch
Copy link

Small update: Stop generation button is still an issue

@justinh-rahb
Copy link
Collaborator

@DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the stop generation function here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants