Adapting to new models #162

epinnock · 2023-12-24T16:55:27Z

Hi I would like to adapt this to the Phi Model.

Is there any good documentation on guide to help with this?

epinnock · 2023-12-24T21:29:28Z

Hi 👋

Wanted to follow up on this some more I'm currently attempting to finteune TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T Using LongLora I'm currently getting the following error.

Per their Github TinyLlama architecture should be the same as llama, but I'm seeing errors within llama_attn_replace.py.
Error:

 File "/workspace/LongLoRA/llama_attn_replace.py", line 96, in forward_flashattn
    key_padding_mask = attention_mask.repeat(2, 1)

RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

Context

    
    # We have disabled _prepare_decoder_attention_mask in LlamaModel
    # the attention_mask should be the same as the key_padding_mask

    key_padding_mask = attention_mask.repeat(2, 1)
    nheads = qkv.shape[-2]

Log

 0%|                                                                                    | 0/1000 [00:00<?, ?it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
  File "/workspace/LongLoRA/fine-tune.py", line 220, in <module>
    train()
  File "/workspace/LongLoRA/fine-tune.py", line 214, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1833, in forward
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1058, in forward
    layer_outputs = self._gradient_checkpointing_func(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 796, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/LongLoRA/llama_attn_replace.py", line 96, in forward_flashattn
    key_padding_mask = attention_mask.repeat(2, 1)
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 16124 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 16125) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1002, in launch_command
    deepspeed_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 718, in deepspeed_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
fine-tune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-12-24_20:58:39
  host      : 372ceeb793ae
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 16125)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

dddavid4real · 2023-12-27T10:59:22Z

Hey I happen to meet the same error. Turns out that it is caused by the version of transformers library. I downgrade it to transformers=4.34.0, and everything works fine now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapting to new models #162

Adapting to new models #162

epinnock commented Dec 24, 2023

epinnock commented Dec 24, 2023

dddavid4real commented Dec 27, 2023 •

edited

Adapting to new models #162

Adapting to new models #162

Comments

epinnock commented Dec 24, 2023

epinnock commented Dec 24, 2023

dddavid4real commented Dec 27, 2023 • edited

dddavid4real commented Dec 27, 2023 •

edited