[BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ #69

Lyzin · 2023-12-26T07:12:44Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

使用llama.cpp项目先转化为f16
python3 convert-hf-to-gguf.py models/Qwen-1_8B-Chat/

然后推理
./main -m ./models/Qwen-1_8B-Chat/ggml-model-f16.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

但是回答错乱，1.8B是不支持llama.cpp量化吗？

同样试了转为int4量化，也是出现回答错乱

期望行为 | Expected Behavior

期望可以正常回答

复现方法 | Steps To Reproduce

下载llama.cpp项目
下载Qwen-1_8B-Chat模型
转化模型为f16精度
再转为int4量化版本推理

推理出现回答错乱看不懂

运行环境 | Environment

- OS: macos
- Python: 3.9
- Transformers:
- PyTorch: 
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Lyzin · 2023-12-26T09:23:55Z

我重新验证了下，因为llama.cpp在mac os默认开启了metal，然后llama.cpp编译的main推理时默认使用了mac os系统的显卡推理，就会出现回答看不懂的情况，但关闭macos的显卡推理，就回答一切正常，请官方也帮忙看看是不是有这个问题

完整启动命令

关闭mac os显卡推理，添加 -ngl 0参数

./main -m ./models/Qwen-1_8B-Chat/ggml-model-q4_0.gguf -n 512 --color -i -cml -ngl 0 -f prompts/chat-with-qwen.txt

ban-shi-yi-sheng · 2023-12-28T01:44:34Z

你能转换成功也是nb 我这用llama的转换都不行那边现在是gguf格式了这边刚出来怎么qwen.cpp 转换的是ggml格式呢? 能不能无缝转成gguf格式啊这样就能llama使用了那边服务端也能运行了

ban-shi-yi-sheng · 2023-12-28T03:32:54Z

是我见识浅薄了 python3 convert-hf-to-gguf.py 用这个可以转转完用q8_0量化刚开始用确实有几次跟发神经似的不过现在貌似好了 ....

./main -m /Users/xxxx/AI/Models/Qwen-14B-Chat/ggml-model-f16-q8_0.gguf \                                                          ─╯
--color -i -ngl 1 -c 4096 -t 8 --temp 0.5 --top_k 40 --top_p 0.9 --repeat_penalty 1.1 -f /Users/xxxxx/AI/llama.cpp/prompts/chat-with-qwen.txt -cml

100ZZ · 2024-05-27T06:38:21Z

是的，我试了qwen 0.5B，7B，14B，用llama.cpp转换F16的GGUF，回答还都是错乱的

Lyzin changed the title ~~[BUG] Qwen-1.8-Chat，用llama.cpp量化为F16，然后推理回答错乱看不懂~~ [BUG] Qwen-1.8-Chat，用llama.cpp量化为F16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ Dec 26, 2023

Lyzin changed the title ~~[BUG] Qwen-1.8-Chat，用llama.cpp量化为F16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？~~ [BUG] Qwen-1.8-Chat，用llama.cpp量化为int，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ Dec 26, 2023

Lyzin changed the title ~~[BUG] Qwen-1.8-Chat，用llama.cpp量化为int，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？~~ [BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ Dec 26, 2023

jklj077 assigned simonJJJ Dec 26, 2023

jklj077 transferred this issue from QwenLM/Qwen Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ #69

[BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ #69

Lyzin commented Dec 26, 2023 •

edited

Lyzin commented Dec 26, 2023 •

edited

ban-shi-yi-sheng commented Dec 28, 2023

ban-shi-yi-sheng commented Dec 28, 2023

100ZZ commented May 27, 2024

[BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ #69

[BUG] Qwen-1.8-Chat，用llama.cpp量化为f16，然后推理回答错乱，请问1.8在llama.cpp还不支持吗？ #69

Comments

Lyzin commented Dec 26, 2023 • edited

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Lyzin commented Dec 26, 2023 • edited

关闭mac os显卡推理，添加 -ngl 0参数

ban-shi-yi-sheng commented Dec 28, 2023

ban-shi-yi-sheng commented Dec 28, 2023

100ZZ commented May 27, 2024

Lyzin commented Dec 26, 2023 •

edited

Lyzin commented Dec 26, 2023 •

edited