Script to convert Grok-1 weights from raw JAX pickle files. #7058

heiner · 2024-05-03T11:36:27Z

This adds a script to convert the raw weights in the pickle files to GGUF format. This allows using @arki05's work in #6204 directly from the Grok-1 torrent.

Code is based on @foldl's conversion script in chatllm.cpp, which in turn is based on @chu-tianxiang's gist.

Main ideas to avoid excessive memory:

Parse pickle files using mmap.
Use PyTorch "meta" tensors to simulate shape and dtype results without having to do all conversions beforehand.

Note that I couldn't run the full model due to RAM constrains and it's possible I mixed up some tensor names.

slaren · 2024-05-03T12:29:26Z

Does this merge the experts into a single tensor?

heiner · 2024-05-03T12:53:54Z

Does this merge the experts into a single tensor?

It does the opposite -- in the raw data, the 8 experts are part of the same tensor. This splits them, which is also what the chatllm.cpp script does.

If there is a way to keep them within one tensor I'm happy to make that change.

slaren · 2024-05-03T12:58:51Z

The preferred way to export the expert tensors is as a single 3D tensor for all the experts. It is still possible to use one tensor per expert for backwards compatibility, but it forces the model weights to be copied to a buffer while loading, rather than using them directly from the memory mapped file. For large models like grok, I think it is especially important to be able to avoid this copy and use mmap.

heiner · 2024-05-03T13:07:10Z

Understood. That will actually make the script simpler. Would you happen to know the tensor names I should use in this case? Currently when using splitting, they are

| blk.{layer}.ffn_gate.{expert}.weight        | torch.Size([32768, 6144])  | Q4_0    |
| blk.{layer}.ffn_down.{expert}.weight        | torch.Size([6144, 32768])  | Q4_0    |
| blk.{layer}.ffn_up.{expert}.weight          | torch.Size([32768, 6144])  | Q4_0    |

slaren · 2024-05-03T13:14:18Z

The tensor names are defined in gguf-py:

llama.cpp/gguf-py/gguf/constants.py

Lines 249 to 251 in 60325fa

    
           MODEL_TENSOR.FFN_GATE_EXP:       "blk.{bid}.ffn_gate_exps", 
        
           MODEL_TENSOR.FFN_DOWN_EXP:       "blk.{bid}.ffn_down_exps", 
        
           MODEL_TENSOR.FFN_UP_EXP:         "blk.{bid}.ffn_up_exps",

It would be good to use these constants rather than hardcoding the names.

As per ggerganov#7058 (comment). This helps avoid a memcopy when running.

This saves weights in the order in which they are in the Grok-1 files. Since we operate weight-by-weight now, we no longer need caches and name2key translations. Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.

heiner · 2024-05-03T21:54:17Z

Thanks!

I have updated the branch to no longer split the MoE weights into separate tensors. That simplifies the script as it's now one weight per file. The original script permutated the order in which these weights are written for some reason, I stopped doing that now and thus there's only one list of weight names.

I also moved to the values in the gguf.TENSOR_NAMES dict as per your suggestion. I'm not sure that's a clear improvement ("Explicit is better than implicit."), especially in view of code like name.endswith("attn_k") and name.endswith("_exps") but it's also not much worse.

PTAL.

foldl · 2024-05-03T23:35:33Z

@heiner, name of my project is ChatLLM.cpp, not ChatLLM.ccp, 😄

ggerganov

We can merge after lint fixes

ggerganov · 2024-05-09T14:02:46Z

Hm, I tested Q4_0 conversion and it does not seem to work:

python3 convert_grok.py -i ~/Data/huggingface/grok-1/ckpt/ --vocab_dir ../grok-1 -o x.gguf -t q4_0

make -j && ./main -m x.gguf -p "I believe the meaning of life is" -ngl 0

...

[BOS] I believe the meaning of life is it:000000000000000 of0000000000000000000000000000000000000000000000

Might need more work

Did not work on my machine

foldl · 2024-05-09T14:21:38Z

convert_grok.py

+    if name.endswith("attn_k"):
+        return permute(tensor, config.num_key_value_heads)
+    elif name.endswith("attn_q"):
+        return permute(tensor, config.num_attention_heads)


according to llama_rope_type(), Grok is using LLAMA_ROPE_TYPE_NEOX, so do not permute.

Removed the call to this function and the function itself.

foldl · 2024-05-09T14:22:47Z

convert_grok.py

+        "output_multiplier_scale": 0.5773502691896257,
+        "embedding_multiplier_scale": 78.38367176906169,


embedding_multiplier_scale should be multiplied to embeddings.

output_multiplier_scale can be ignored, IMO.

Added

if name == "token_embd": weight *= config.embedding_multiplier_scale

My original reason for not doing that was that this appears to also happen in the C++ code here:

llama.cpp/llama.cpp

Line 7487 in befddd0

inpL = ggml_scale(ctx0, inpL, 78.38367176906169f);

Oh, I missed this. Then, embedding_multiplier_scale should not be multiplied.

But, it's better to do the multiplication once and offline than to do it each time at runtime.

heiner · 2024-05-09T14:52:33Z

[BOS] I believe the meaning of life is it:000000000000000 of0000000000000000000000000000000000000000000000

My apologies. As I said above I couldn't actually test running the full model on my setup. I will fix @foldl's suggestions.

Would you happen to have something like the sha-1 of each tensor of a checkpoint based on the HF weights? Otherwise I can download those and run that conversion for comparision.

heiner · 2024-05-09T20:50:09Z

Thanks @foldl for the hints. It's well possible I mixed something else up as well, e.g., swapped two tensors with the same shape and dtype. Would you happen to have a tensor name -> hash table for a correct conversion?

foldl · 2024-05-09T23:51:10Z

@heiner You need to compare the result against #6204, although I don't think sha-1 could match.

chatllm.cpp permutes k_proj/q_proj weights, so, sha-1 would not match, either.

…ns in llama.cpp.

heiner · 2024-05-10T10:54:01Z

Thanks. I have removed the multiplication with embedding_multiplier_scale again and converted the full model with all 8 experts. The output is not great but also not as bad as before (gist with full output):

./build/bin/main -m grok.bin -p "I believe the meaning of life is" -s 2 -n 10 -ngl 0
(...)
[BOS] I believe the meaning of life is important it could possibly the general of the general I

It's likely something else is wrong but I'm unsure what it is, and the multiple-hour iteration time makes it infeasible to just try out random things.

Script to convert Grok-1 weights from raw JAX pickle files.

7c54f47

heiner added 2 commits May 3, 2024 22:38

Don't split MoE weights.

72920b1

As per ggerganov#7058 (comment). This helps avoid a memcopy when running.

compilade mentioned this pull request May 4, 2024

convert-hf : save memory with lazy evaluation #7075

Merged

7 tasks

mofosyne added the python python script changes label May 9, 2024

ggerganov previously approved these changes May 9, 2024

View reviewed changes

mofosyne added the review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level label May 9, 2024

ggerganov marked this pull request as draft May 9, 2024 14:03

foldl suggested changes May 9, 2024

View reviewed changes

Update convert_grok.py to use logging module

b16d543

mofosyne added the enhancement New feature or request label May 9, 2024

Move print to logging: Fixes.

098c14e

heiner force-pushed the master branch from a03c8a6 to 098c14e Compare May 9, 2024 20:12

Address review comments by foldl.

8a72b3d

Don't multiply embeddings with embedding_multiplier_scale as it happe…

080b549

…ns in llama.cpp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to convert Grok-1 weights from raw JAX pickle files. #7058

Script to convert Grok-1 weights from raw JAX pickle files. #7058

heiner commented May 3, 2024 •

edited

slaren commented May 3, 2024

heiner commented May 3, 2024 •

edited

slaren commented May 3, 2024 •

edited

heiner commented May 3, 2024

slaren commented May 3, 2024

heiner commented May 3, 2024

foldl commented May 3, 2024

ggerganov left a comment

ggerganov commented May 9, 2024

foldl May 9, 2024 •

edited

heiner May 9, 2024

foldl May 9, 2024

heiner May 9, 2024

heiner May 9, 2024

foldl May 10, 2024

heiner commented May 9, 2024

heiner commented May 9, 2024

foldl commented May 9, 2024 •

edited

heiner commented May 10, 2024

		"output_multiplier_scale": 0.5773502691896257,
		"embedding_multiplier_scale": 78.38367176906169,

Script to convert Grok-1 weights from raw JAX pickle files. #7058

Are you sure you want to change the base?

Script to convert Grok-1 weights from raw JAX pickle files. #7058

Conversation

heiner commented May 3, 2024 • edited

slaren commented May 3, 2024

heiner commented May 3, 2024 • edited

slaren commented May 3, 2024 • edited

heiner commented May 3, 2024

slaren commented May 3, 2024

heiner commented May 3, 2024

foldl commented May 3, 2024

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov commented May 9, 2024

foldl May 9, 2024 • edited

Choose a reason for hiding this comment

heiner May 9, 2024

Choose a reason for hiding this comment

foldl May 9, 2024

Choose a reason for hiding this comment

heiner May 9, 2024

Choose a reason for hiding this comment

heiner May 9, 2024

Choose a reason for hiding this comment

foldl May 10, 2024

Choose a reason for hiding this comment

heiner commented May 9, 2024

heiner commented May 9, 2024

foldl commented May 9, 2024 • edited

heiner commented May 10, 2024

heiner commented May 3, 2024 •

edited

heiner commented May 3, 2024 •

edited

slaren commented May 3, 2024 •

edited

foldl May 9, 2024 •

edited

foldl commented May 9, 2024 •

edited