Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF breaks - llama-3 #430

Closed
1 of 2 tasks
danielhanchen opened this issue May 5, 2024 · 3 comments
Closed
1 of 2 tasks

GGUF breaks - llama-3 #430

danielhanchen opened this issue May 5, 2024 · 3 comments
Labels
fixed Fixed!

Comments

@danielhanchen
Copy link
Contributor

danielhanchen commented May 5, 2024

Findings from ggerganov/llama.cpp#7062 and Discord chats:
Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing

  1. Unsloth + float16 + QLoRA = WORKS
  2. Unsloth + bfloat16 + QLoRA = WORKS
  3. Unsloth + bfloat16 + LoRA = WORKS
  4. Unsloth + float16 + QLoRA + GGUF-f16 = FAILS
  5. Unsloth + bfloat16 + LoRA + GGUF-f16 = FAILS

Todo:

  • HF directly + float16 + QLoRA + GGUF-f16
  • HF directly + float16 + LoRA + GGUF-f16
@danielhanchen danielhanchen added currently fixing Am fixing now! URGENT BUG Urgent bug labels May 5, 2024
@danielhanchen danielhanchen pinned this issue May 5, 2024
@danielhanchen
Copy link
Contributor Author

Update:
Hi so I managed to test HF -> llama.cpp without Unsloth to remove Unsloth from the picture.

  1. '\n\n' is tokenized as [1734, 1734], unless if I prompted it incorrectly.
  2. [1734] using tokenizer.batch_decode([1734]) returns \\n.
  3. Ie llama.cpp is tokenizing \n\n as \\n\\n.
  4. Using HF directly, we get:
    \\n = 1734
    \n = 198
    \n\n = 271
    \n\n\n = 1432
    4*\n = 1038
    5*\n = 14963
    6*\n = 5244
    7*\n = 35683
    8*\n = 6087
    9*\n = 55160

I used !python llama.cpp/convert-hf-to-gguf.py ./model --outfile ./model.f16.gguf --outtype f16 then !./llama.cpp/main -m ./model.f16.gguf -n 1024 --temp 0.0 --verbose-prompt --check-tensors \ -p "<|start_header_id|>user<|end_header_id|>\n\n!!llama.cpp!!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

See reproducible notebook: https://colab.research.google.com/drive/1aNS8CgXoJZHclBEW3ZjFfiLjpmqZ14KN?usp=sharing

Below is the comparison of tokenization differences between llama.cpp and HF:
image

I also used convert.py which I'm assuming is not anyways supposed to work (maybe). I chose --vocab-type bpe. Reproducible example: https://colab.research.google.com/drive/1X8XBdLRf1-eRDSfcr_GrIhaf84Wp9FH1?usp=sharing

Sadly convert.py is even worse, splitting the newlines into 2 distinct characters:
image

@araleza
Copy link

araleza commented May 6, 2024

Thanks for having looked into this. I've been suspicious of these \n's in llama.cpp since I noticed that when I added \n\n for llama 3's prompt, the Continuation would usually add a third one at the start of the reply for no obvious reason. What you're finding it probably the reason for that.

@danielhanchen
Copy link
Contributor Author

It should be fixed!

@danielhanchen danielhanchen added fixed Fixed! and removed currently fixing Am fixing now! URGENT BUG Urgent bug labels May 10, 2024
@danielhanchen danielhanchen unpinned this issue May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Fixed!
Projects
None yet
Development

No branches or pull requests

2 participants