GGUF Conversion - codegemma 2b and vocab for FIM / infill #7205

ScottMcNaught · 2024-05-10T19:22:42Z

ScottMcNaught
May 10, 2024

Hello,

I am to fine-tuneing codegemma by Google. When I convert to GGUF, I believe that the tokens or configuration for FIM / Infill are going missing.

When I run:
https://huggingface.co/google/codegemma-2b-GGUF
It works perfectly in llamacpp.

When I run:
https://huggingface.co/google/codegemma-2b
And convert to GGUF manually, the conversion completes.
But running llamacpp, when I make an /infill request, llamacpp segfaults.

I am using the following to perform conversion and quantization:

docker run --ipc=host -v "/root/UNTRAINED-hf-codegemma-2b:/root/UNTRAINED-hf-codegemma-2b" \
ghcr.io/ggerganov/llama.cpp:full-cuda --convert /root/UNTRAINED-hf-codegemma-2b --outfile /root/UNTRAINED-hf-codegemma-2b/model.gguf --vocab-type spm,hfft,bpe

docker run --gpus all --ipc=host -v "/root/UNTRAINED-hf-codegemma-2b:/root/UNTRAINED-hf-codegemma-2b" \
ghcr.io/ggerganov/llama.cpp:full-cuda --quantize /root/UNTRAINED-hf-codegemma-2b/model.gguf /root/UNTRAINED-hf-codegemma-2b/model-Q4_K_M.gguf Q4_K_M

How can I convert to GGUF and ensure that /infill is supported?

Answered by ScottMcNaught

May 12, 2024

I found the problem. Using your docker image with --convert, it detects codegemma as 'llama' architecture.

If I override the entrypoint to /app/convert-hf-to-gguf.py on the same docker image, it gets the architecture correct.
I also set the working directory to that of my image.

When I did this, I got a perfect conversion and did not have to set metadata myself.

View full answer

ScottMcNaught · 2024-05-11T21:54:55Z

ScottMcNaught
May 11, 2024
Author

Update: I ran this:

/app/gguf-py/scripts/gguf-new-metadata.py model_q4_k_m.gguf model_q4_k_m_with_meta.gguf --special-token prefix '<|fim_prefix|>' --special-token middle '<|fim_middle|>' --special-token suffix '<|fim_suffix|>'

This stops it segfaulting, But I get a garbage response. (words of random letters)

2 replies

ggerganov May 12, 2024
Maintainer

Could you post the infill command that you use? And the full log that you get

ScottMcNaught May 12, 2024
Author

I found the problem. Using your docker image with --convert, it detects codegemma as 'llama' architecture.

If I override the entrypoint to /app/convert-hf-to-gguf.py on the same docker image, it gets the architecture correct.
I also set the working directory to that of my image.

When I did this, I got a perfect conversion and did not have to set metadata myself.

Answer selected by ScottMcNaught

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF Conversion - codegemma 2b and vocab for FIM / infill #7205

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

GGUF Conversion - codegemma 2b and vocab for FIM / infill #7205

ScottMcNaught May 10, 2024

Replies: 1 comment · 2 replies

ScottMcNaught May 11, 2024 Author

ggerganov May 12, 2024 Maintainer

ScottMcNaught May 12, 2024 Author

ScottMcNaught
May 10, 2024

Replies: 1 comment 2 replies

ScottMcNaught
May 11, 2024
Author

ggerganov May 12, 2024
Maintainer

ScottMcNaught May 12, 2024
Author