GGUF breaks - llama-3 #430

danielhanchen · 2024-05-05T18:41:09Z

Findings from ggerganov/llama.cpp#7062 and Discord chats:
Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing

Unsloth + float16 + QLoRA = WORKS
Unsloth + bfloat16 + QLoRA = WORKS
Unsloth + bfloat16 + LoRA = WORKS
Unsloth + float16 + QLoRA + GGUF-f16 = FAILS
Unsloth + bfloat16 + LoRA + GGUF-f16 = FAILS

Todo:

HF directly + float16 + QLoRA + GGUF-f16
HF directly + float16 + LoRA + GGUF-f16

danielhanchen · 2024-05-06T18:25:11Z

Update:
Hi so I managed to test HF -> llama.cpp without Unsloth to remove Unsloth from the picture.

'\n\n' is tokenized as [1734, 1734], unless if I prompted it incorrectly.
[1734] using tokenizer.batch_decode([1734]) returns \\n.
Ie llama.cpp is tokenizing \n\n as \\n\\n.
Using HF directly, we get:
\\n = 1734
\n = 198
\n\n = 271
\n\n\n = 1432
4*\n = 1038
5*\n = 14963
6*\n = 5244
7*\n = 35683
8*\n = 6087
9*\n = 55160

See reproducible notebook: https://colab.research.google.com/drive/1aNS8CgXoJZHclBEW3ZjFfiLjpmqZ14KN?usp=sharing

Below is the comparison of tokenization differences between llama.cpp and HF:

I also used convert.py which I'm assuming is not anyways supposed to work (maybe). I chose --vocab-type bpe. Reproducible example: https://colab.research.google.com/drive/1X8XBdLRf1-eRDSfcr_GrIhaf84Wp9FH1?usp=sharing

Sadly convert.py is even worse, splitting the newlines into 2 distinct characters:

araleza · 2024-05-06T19:41:04Z

Thanks for having looked into this. I've been suspicious of these \n's in llama.cpp since I noticed that when I added \n\n for llama 3's prompt, the Continuation would usually add a third one at the start of the reply for no obvious reason. What you're finding it probably the reason for that.

danielhanchen · 2024-05-10T19:09:24Z

It should be fixed!

danielhanchen added currently fixing Am fixing now! URGENT BUG Urgent bug labels May 5, 2024

danielhanchen pinned this issue May 5, 2024

danielhanchen added fixed Fixed! and removed currently fixing Am fixing now! URGENT BUG Urgent bug labels May 10, 2024

danielhanchen closed this as completed May 10, 2024

danielhanchen unpinned this issue May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF breaks - llama-3 #430

GGUF breaks - llama-3 #430

danielhanchen commented May 5, 2024 •

edited

danielhanchen commented May 6, 2024

araleza commented May 6, 2024

danielhanchen commented May 10, 2024

GGUF breaks - llama-3 #430

GGUF breaks - llama-3 #430

Comments

danielhanchen commented May 5, 2024 • edited

danielhanchen commented May 6, 2024

araleza commented May 6, 2024

danielhanchen commented May 10, 2024

danielhanchen commented May 5, 2024 •

edited