feat: direct llama.cpp integration #1483

tjbck · 2024-04-10T03:50:45Z

No description provided.

jukofyork · 2024-04-10T16:31:13Z

Just a quick follow-up to say it seems to work fine:

I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: ./server --port 8081 ...).
Then set the OpenAPI base URL to http://127.0.0.1:8081/v1 and the API Key to not be blank (eg: none) in OpenWebUI settings.

and it seems to be calling the OAI-like API endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the /v1 to the URL and ensure the API Key not be blank though (had to find by trial and error).

The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the OAI-like API endpoint to get these stats:

{
  "tid": "140627543928832",
  "timestamp": 1712766280,
  "level": "INFO",
  "function": "print_timings",
  "line": 313,
  "msg": "prompt eval time     =     129.89 ms /    55 tokens (    2.36 ms per token,   423.43 tokens per second)",
  "id_slot": 0,
  "id_task": 13,
  "t_prompt_processing": 129.892,
  "n_prompt_tokens_processed": 55,
  "t_token": 2.3616727272727274,
  "n_tokens_second": 423.42869460782805
}

I'll report back if I can see any other major differences, but otherwise 👍

jukofyork · 2024-04-12T10:27:19Z

I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK:

#1166
#1170

It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.

jukofyork · 2024-04-12T11:01:47Z

Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.

tjbck · 2024-04-14T20:35:40Z

@jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.

DenisSergeevitch · 2024-04-26T00:41:38Z

Small update: Stop generation button is still an issue

justinh-rahb · 2024-04-26T00:47:58Z

@DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the stop generation function here:

BUG: Stop button don't stop generation ollama #1568

tjbck mentioned this issue Apr 10, 2024

feat: multiple OpenAI connections #693

Closed

tjbck self-assigned this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: direct llama.cpp integration #1483

feat: direct llama.cpp integration #1483

tjbck commented Apr 10, 2024

jukofyork commented Apr 10, 2024 •

edited

jukofyork commented Apr 12, 2024

jukofyork commented Apr 12, 2024

tjbck commented Apr 14, 2024

DenisSergeevitch commented Apr 26, 2024

justinh-rahb commented Apr 26, 2024

feat: direct llama.cpp integration #1483

feat: direct llama.cpp integration #1483

Comments

tjbck commented Apr 10, 2024

jukofyork commented Apr 10, 2024 • edited

jukofyork commented Apr 12, 2024

jukofyork commented Apr 12, 2024

tjbck commented Apr 14, 2024

DenisSergeevitch commented Apr 26, 2024

justinh-rahb commented Apr 26, 2024

jukofyork commented Apr 10, 2024 •

edited