-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? #69
Comments
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱看不懂
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
[BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
[BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
你能转换成功也是nb 我这用llama的转换都不行 那边现在是gguf格式了 这边刚出来怎么qwen.cpp 转换的是ggml格式呢? 能不能无缝转成gguf格式啊 这样就能llama使用了 那边服务端也能运行了 |
是的,我试了qwen 0.5B,7B,14B,用llama.cpp转换F16的GGUF,回答还都是错乱的 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
使用llama.cpp项目先转化为f16
python3 convert-hf-to-gguf.py models/Qwen-1_8B-Chat/
然后推理
./main -m ./models/Qwen-1_8B-Chat/ggml-model-f16.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
但是回答错乱,1.8B是不支持llama.cpp量化吗?
同样试了转为int4量化,也是出现回答错乱
期望行为 | Expected Behavior
期望可以正常回答
复现方法 | Steps To Reproduce
下载llama.cpp项目
下载Qwen-1_8B-Chat模型
转化模型为f16精度
再转为int4量化版本推理
推理出现回答错乱看不懂
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: