Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for phi3 - llama.cpp update #99

Open
superchargez opened this issue Apr 28, 2024 · 2 comments
Open

Support for phi3 - llama.cpp update #99

superchargez opened this issue Apr 28, 2024 · 2 comments
Assignees

Comments

@superchargez
Copy link

I already downloaded phi3 instruct gguf from: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf

and placed it at: /models/
jawad@desktoper:
/models$ ls
Phi-3-mini-4k-instruct-q4.gguf

Yet, when I give command to choose model with -m or --model I get error (it returns the output as for help ./server.sh --help) here is complete output:
jawad@desktoper:~/gits/aici/rllm/rllm-llamacpp$ ./server.sh -m /home/jawad/models/Phi-3-mini-4k-instruct-q4.gguf
usage: server.sh [--loop] [--cuda] [--debug] [model_name] [rllm_args...]

model_name can a HuggingFace URL pointing to a .gguf file, or one of the following:

phi2 https://huggingface.co/TheBloke/phi-2-GGUF/blob/main/phi-2.Q8_0.gguf
orca https://huggingface.co/TheBloke/Orca-2-13B-GGUF/blob/main/orca-2-13b.Q8_0.gguf
mistral https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf
mixtral https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/blob/main/mixtral-8x7b-instruct-v0.1.Q6_K.gguf
code70 https://huggingface.co/TheBloke/CodeLlama-70B-Instruct-GGUF/blob/main/codellama-70b-instruct.Q5_K_M.gguf

Additionally, "server.sh build" will just build the server, and not run a model.

--cuda try to build llama.cpp against installed CUDA
--loop restart server when it crashes and store logs in ./logs
--debug don't build in --release mode

Try server.sh phi2 --help to see available rllm_args

Though if I choose phi2 instead of downloaded model it works fine. Does aici not support phi3 or is this a bug, and how to fix it? (I could adding a line for phi3 after this:
63 phi2 )
64 ARGS="-m https://huggingface.co/TheBloke/phi-2-GGUF/blob/main/phi-2.Q8_0.gguf -t phi -w $EXPECTED/phi-2/cats.safetensors -s test_max tol=0.8 -s test_avgtol=0.3"
65 ;;

in server.sh (rllm-cuda/server.sh) solve the problem if I just replace the URL? But I don't want to download the model again, so, how can I use the local model which is not in the list of models i.e. phi2, mistral, mixtral etc?

@mmoskal
Copy link
Member

mmoskal commented Apr 29, 2024

This requires upgrading version of llama.cpp used. I should get to this sometimes this week or next.

@mmoskal mmoskal self-assigned this Apr 29, 2024
@mmoskal mmoskal changed the title Support for phi3 Support for phi3 - llama.cpp update Apr 29, 2024
@superchargez
Copy link
Author

Is not it possible to spun up a llamacpp server and reference it in aici.sh? Will this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants