How can I convert llama3 safetensors to the pth file needed to use with executorch? #3303

l3utterfly · 2024-04-24T14:20:17Z

Fine-tunes of Llama3 usually only have safetensors uploaded. In order to compile a Llama3 model following the tutorial, I need the original pth checkpoint file.

Is there a way to convert the safetensors to the checkpoint file?

mergennachin · 2024-04-24T22:41:09Z

@l3utterfly

Take a look at some example util functions in torchtune. Let us know if it works

https://github.com/pytorch/torchtune/blob/main/torchtune/utils/_checkpointing/_checkpointer_utils.py#L66C18-L69

l3utterfly · 2024-04-25T08:53:37Z

So I need to use pytorch to save the state dict file?

I tried that with a Llama3 fine tune and then tried to compile it for XNNPACK, I got this error:

Could not import fairseq2 modules.
[INFO 2024-04-25 08:51:48,994 builder.py:84] Loading model with checkpoint=/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B.pth, params=/home/layla/src/text-generation-webui/models/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-04-25 08:51:49,226 builder.py:105] Loaded model with dtype=torch.bfloat16
[INFO 2024-04-25 08:51:49,240 config.py:58] PyTorch version 2.4.0.dev20240422+cpu available.
linear: layers.0.attention.wq, in=4096, out=4096
Traceback (most recent call last):
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/home/layla/src/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 302, in export_llama
    return _export_llama(modelname, args)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 380, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 365, in _prepare_for_llama_export
    .source_transform(transforms)
  File "/home/layla/src/executorch/examples/models/llama2/builder.py", line 203, in source_transform
    self.model = transform(self.model)
  File "/home/layla/src/executorch/examples/models/llama2/source_transformation/quantize.py", line 80, in quantize
    ).quantize(model)
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1256, in quantize
    state_dict = self._create_quantized_state_dict(model)
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1214, in _create_quantized_state_dict
    ) = group_quantize_tensor_symmetric(
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 509, in group_quantize_tensor_symmetric
    scales, zeros = get_group_qparams_symmetric(w, n_bit, group_size, precision)
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 482, in get_group_qparams_symmetric
    assert torch.isnan(to_quant).sum() == 0
NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl.Please see the following doc for next steps: https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

This is the code I used to save the pth file:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

print('loading model...')

model_id = "/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)

print('saving model...')

# Save only the state dictionary
torch.save(model.state_dict(), "/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B.pth")

What am I doing wrong here?

iseeyuan · 2024-04-26T23:09:57Z

@l3utterfly Could you share the command you used in torchtune, as well as the export_llama?

l3utterfly · 2024-04-28T02:51:41Z

@iseeyuan sorry, I'm a little new to torch tune, following the documentation here: https://pytorch.org/torchtune/stable/deep_dives/checkpointer.html#understand-checkpointer

I am loading my safetensor file first:

AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)

Then, I save it to the torch state dictionary:

torch.save(model.state_dict(), "/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B.pth")

This is the command I'm using to convert:
python -m examples.models.llama2.export_llama --checkpoint /models/Aura_Uncensored_l3_8B.pth -p /models/Meta-Llama-3-8B-Instruct/original/params.json -d=fp32 -X -qmode 8da4w -kv --use_sdpa_with_kv_cache --output_name="llama3_aura_uncensored_kv_sdpa_xnn_qe_4_32_ctx4096.pte" --group_size 128 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --max_seq_len 4096

iseeyuan · 2024-04-29T19:30:16Z

@l3utterfly I can see two options,

Have you tried directly safe_open your .safetensors file, and use the codes @mergennachin suggested? Then you could use torch.save to save the resulting state_dict. Please let us know if it works.
Fine tune the model using TorchTune directly. The output of the tuned model would be directly saved as pytorch checkpoint. It has been verified that it's compatible with ExecuTorch.

l3utterfly · 2024-05-02T01:54:38Z

@iseeyuan this is the error I'm getting afer converting the safetensor file with torchtune util function you suggested. This error happens after running the compile pt to executorch command:

NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl.Please see the following doc for next steps: https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

I tried to fine tune with torchtune, but it appears Torchtune does not support finetuning on top of another finetune? It is still looking for the original/checkpoint.pth file, but fine tunes do not have that.

Any way forward from here?

iseeyuan · 2024-05-02T18:22:30Z

@l3utterfly Let me try to convert the safetensor files and let you know if there's a way to workaround.

I tried to fine tune with torchtune, but it appears Torchtune does not support finetuning on top of another finetune? It is still looking for the original/checkpoint.pth file, but fine tunes do not have that.

@kartikayk, are you aware of this?

kartikayk · 2024-05-02T18:28:14Z

@l3utterfly torchtune doesn't really care about how the checkpoint is produced i.e. whether it's a finetuned model or a pre-trained model. All it cares about is that the formats should match up with what the checkpointer expects.

When the model changes, you need to update the config to point to the right checkpoint etc. Can you share the exact torchtune config you're using and the command you used to launch training?

l3utterfly · 2024-05-03T01:47:00Z

@kartikayk I am trying to convert this model: https://huggingface.co/ResplendentAI/Aura_Uncensored_l3_8B

It doesn't contain any original pytorch checkpoint files that torchtune supports, so trying to finetune with torch gets me back to square one: how can I convert the safetensor files to a pytorch checkpoint.

kartikayk · 2024-05-03T14:57:49Z

This should work OOTB if you update the checkpoint files in the config to point to the safetensors. I tried loading the checkpoint into the llama3 8B model in torchtune and the keys loaded successfull:

Note that you need to update the checkpointer to point to the HFCheckpointer since the safetensor files are in the HF format. The deep dive you pointed to above has a lot more information about checkpointer formats, but let me know if you have questions.

I'm not sure what your config looks like, but just update the Llama3 config to the following:

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /data/users/kartikayk/cpts/Aura_Uncensored_l3_8B
  checkpoint_files: [
   model-00001-of-00002.safetensors,
   model-00002-of-00002.safetensors
  ]
  output_dir: /data/users/kartikayk/cpts/Aura_Uncensored_l3_8B
  model_type: LLAMA3
resume_from_checkpoint: False

For examples of how to use safetensors take a look at the Gemma configs

l3utterfly · 2024-05-03T15:57:59Z

Thanks for the help, I got the finetuning to run now with this config.

First, I tried to load and save the dict right away:

from torchtune.utils import FullModelHFCheckpointer
import torch

checkpointer = FullModelHFCheckpointer(
    checkpoint_dir='/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B',
    checkpoint_files=['model-00001-of-00002.safetensors', 'model-00002-of-00002.safetensors'],
    output_dir='/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B' ,
    model_type=' LLAMA3 '
)

print("loading checkpoint")
sd = checkpointer.load_checkpoint()

print("saving checkpoint")
torch.save(sd, "/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B/checkpoint.pth")

This still gives the same error as before when trying to compile down to executorch.

NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl.Please see the following doc for next steps: https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

I am now in the process of doing a "finetune" with learning_rate=0 to get the final checkpoint. Is there any way to save checkpoints every X steps instead of at the end?

kartikayk · 2024-05-03T16:32:05Z

Glad this worked!

I am now in the process of doing a "finetune" with learning_rate=0 to get the final checkpoint. Is there any way to save checkpoints every X steps instead of at the end?

Yes, if you want to just get dummy checkpoints you can set max_steps_per_epoch=1 and gradient_accumulation_steps=1. For checkpointing more frequently during training, unfortunately we dont support mid-epoch checkpointing because we dont have a good way to checkpoint the dataloader. This is WIP and once its supported, we can do mid epoch checkpointing. Let me know if this makes sense?

This still gives the same error as before when trying to compile down to executorch.

I don't think executorch supports the HF format though. @iseeyuan can confirm.

iseeyuan · 2024-05-03T22:26:48Z

@l3utterfly I took a deeper look into the state dict (sd) in your code. There are two issues.

The difference is that, the checkpoint returned by checkpointer.load_checkpoint() is a dictionary with one element. The key is 'model' and the value is the actual state dict we are looking for.

Try exactly like what @kartikayk suggested:

sd = checkpointer.load_checkpoint()

print("saving checkpoint")
torch.save(sd['model'], "/home/layla/src/text-generation-webui/models/Aura_Uncensored_l3_8B/checkpoint.pth")

I printed out the state dic from the original llama3's checkpoint, and the checkpoint I got from step 1. I see some differences: a) the tensor names are different; b) the names are different. This can cause the mismatch between the model and checkpoint. For example, the original layer 0 looks like:

layers.0.attention.wk.weight torch.Size([1024, 4096])
layers.0.attention.wo.weight torch.Size([4096, 4096])
layers.0.attention.wq.weight torch.Size([4096, 4096])
layers.0.attention.wv.weight torch.Size([1024, 4096])
layers.0.attention_norm.weight torch.Size([4096])
layers.0.feed_forward.w1.weight torch.Size([14336, 4096])
layers.0.feed_forward.w2.weight torch.Size([4096, 14336])
layers.0.feed_forward.w3.weight torch.Size([14336, 4096])
layers.0.ffn_norm.weight torch.Size([4096])

But the converted layer 0 looks like,

layers.0.attn.k_proj.weight torch.Size([1024, 4096])
layers.0.attn.output_proj.weight torch.Size([4096, 4096])
layers.0.attn.q_proj.weight torch.Size([4096, 4096])
layers.0.attn.v_proj.weight torch.Size([1024, 4096])
layers.0.mlp.w1.weight torch.Size([14336, 4096])
layers.0.mlp.w2.weight torch.Size([4096, 14336])
layers.0.mlp.w3.weight torch.Size([14336, 4096])
layers.0.mlp_norm.scale torch.Size([4096])
layers.0.sa_norm.scale torch.Size([4096])

I don't know what causes this name differences. Since our flow is from the original llama3 checkpoint, I'd suggest you use the same checkpoint and iterate based on that.

iseeyuan · 2024-05-04T01:01:29Z

Update: After chatting with @kartikayk , we need another convert from torchtune to meta's llama3 format. So the code below should work,

from torchtune.utils import FullModelHFCheckpointer
from torchtune.models import convert_weights
import torch

checkpointer = FullModelHFCheckpointer(
    checkpoint_dir='/Users/myuan/.cache/huggingface/hub/models--ResplendentAI--Aura_Uncensored_l3_8B/snapshots/e7720d40e4d8d3c0fa07a8a579fda4d0644aa731',
    checkpoint_files=['model-00001-of-00002.safetensors', 'model-00002-of-00002.safetensors'],
    output_dir='/Users/myuan/data/Aura_Uncensored_l3_8B' ,
    model_type='LLAMA3'
)

print("loading checkpoint")
sd = checkpointer.load_checkpoint()
sd = convert_weights.tune_to_meta(sd['model'])

print("saving checkpoint")
torch.save(sd, "/Users/myuan/data/Aura_Uncensored_l3_8B/checkpoint.pth")

It works well from my side to successfully lower the checkpoint to ExecuTorch. @l3utterfly could you try above conversion and let us know if it works?

l3utterfly · 2024-05-04T01:24:08Z

Yes! This works, thank you so much for helping me!

I think it may be helpful to put/link this script in the Executorch Llama3 docs? I think this is very beneficial to accelerate adoption of Executorch by the wider community. Now people can load Llama3 finetunes with exeuctorch instead of only working with the base model. It will really encourage the local AI community to build infrastructure around executorch!

iseeyuan · 2024-05-06T20:26:25Z

@l3utterfly It's a great idea. Let me put up a PR for this with documentations.

Summary: As titled. It's pretty common that users download the LLM models in safetensor format. Add instructions and example script to convert them to PyTorch format so that export_llama script can accept. It leverages the utils from TorchTune. Thanks l3utterfly and kartikayk for the discussions and suggestions! More context in #3303 Pull Request resolved: #3523 Reviewed By: mergennachin Differential Revision: D57026658 Pulled By: iseeyuan fbshipit-source-id: 11badf709920ff945cdfdd2b244c52c750943412

iseeyuan added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix help wanted Extra attention is needed labels Apr 30, 2024

iseeyuan self-assigned this May 2, 2024

l3utterfly mentioned this issue May 6, 2024

converting llama3 models with added tokens #3519

Open

iseeyuan added the high priority label May 6, 2024

pytorch-bot bot added the triage review Items require an triage review label May 6, 2024

iseeyuan linked a pull request May 6, 2024 that will close this issue

Add instructions to convert Hugging Face models to PyTorch #3523

Closed

iseeyuan mentioned this issue May 6, 2024

Add instructions to convert Hugging Face models to PyTorch #3523

Closed

iseeyuan closed this as completed May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I convert llama3 safetensors to the pth file needed to use with executorch? #3303

How can I convert llama3 safetensors to the pth file needed to use with executorch? #3303

l3utterfly commented Apr 24, 2024

mergennachin commented Apr 24, 2024

l3utterfly commented Apr 25, 2024

iseeyuan commented Apr 26, 2024

l3utterfly commented Apr 28, 2024

iseeyuan commented Apr 29, 2024

l3utterfly commented May 2, 2024

iseeyuan commented May 2, 2024

kartikayk commented May 2, 2024

l3utterfly commented May 3, 2024

kartikayk commented May 3, 2024 •

edited

l3utterfly commented May 3, 2024

kartikayk commented May 3, 2024

iseeyuan commented May 3, 2024

iseeyuan commented May 4, 2024

l3utterfly commented May 4, 2024

iseeyuan commented May 6, 2024

How can I convert llama3 safetensors to the pth file needed to use with executorch? #3303

How can I convert llama3 safetensors to the pth file needed to use with executorch? #3303

Comments

l3utterfly commented Apr 24, 2024

mergennachin commented Apr 24, 2024

l3utterfly commented Apr 25, 2024

iseeyuan commented Apr 26, 2024

l3utterfly commented Apr 28, 2024

iseeyuan commented Apr 29, 2024

l3utterfly commented May 2, 2024

iseeyuan commented May 2, 2024

kartikayk commented May 2, 2024

l3utterfly commented May 3, 2024

kartikayk commented May 3, 2024 • edited

l3utterfly commented May 3, 2024

kartikayk commented May 3, 2024

iseeyuan commented May 3, 2024

iseeyuan commented May 4, 2024

l3utterfly commented May 4, 2024

iseeyuan commented May 6, 2024

kartikayk commented May 3, 2024 •

edited