You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to run unsloth via llamafactory on two V100s with CUDA 12.3 and accelerate, I get the error RuntimeError: setStorage: sizes [4096, 8], strides [1, 4096], storage offset 0, and itemsize 4 requiring a storage size of 131072 are out of bounds for storage of size 0 in matmul_lora.
Traceback (most recent call last):
File "LLaMA-Factory/src/train_bash.py", line 14, in <module>
main()
File "LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "LLaMA-Factory/src/llmtuner/train/tuner.py", line 31, in run_exp
run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
File "LLaMA-Factory/src/llmtuner/train/pt/workflow.py", line 47, in run_pt
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "conda/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "<string>", line 361, in _fast_inner_training_loop
File "conda/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "conda/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, **kwargs)
File "conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 813, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, **kwargs)
File "conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 813, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/mistral.py", line 213, in MistralForCausalLM_fast_forward
outputs = self.model(
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/llama.py", line 650, in LlamaModel_fast_forward
hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
File "conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "conda/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/_utils.py", line 333, in forward
(output,) = forward_function(hidden_states, *args)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/llama.py", line 433, in LlamaDecoderLayer_fast_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/models/mistral.py", line 69, in MistralAttention_fast_forward
Q, K, V = self.apply_qkv(self, hidden_states)
File "conda/lib/python3.10/site-packages/unsloth/kernels/fast_lora.py", line 312, in apply_lora_qkv
Q, K, V = LoRA_QKV.apply(X,
File "conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "conda/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "conda/lib/python3.10/site-packages/unsloth/kernels/fast_lora.py", line 227, in forward
Q = matmul_lora(X, QW, QW_quant, QA, QB, QS)
File "conda/lib/python3.10/site-packages/unsloth/kernels/utils.py", line 240, in matmul_lora
A, B = A.t(), B.t()
RuntimeError: setStorage: sizes [4096, 8], strides [1, 4096], storage offset 0, and itemsize 4 requiring a storage size of 131072 are out of bounds for storage of size 0
I have recreated the conda environment using the instrutions on the front page. If I disable unsloth, llamafactory works.
My best guess is that this is due to not being able to fit the entire model on one GPU for training (I have extended the vocabulary, so I have to fine-tune the embedding layers, not just a standard LoRA or even qLoRA)? I used deepspeed without unsloth on a first data subset, but I would expect unsloth to be much faster, and would like to use it.
The text was updated successfully, but these errors were encountered:
Hmmm sadly multi GPU issues are not a top priority, since Unsloth's mission is to be the best single GPU library - ill see what I can do, but can't promise anything - sorry!
Trying to run unsloth via llamafactory on two V100s with CUDA 12.3 and accelerate, I get the error
RuntimeError: setStorage: sizes [4096, 8], strides [1, 4096], storage offset 0, and itemsize 4 requiring a storage size of 131072 are out of bounds for storage of size 0
inmatmul_lora
.I have recreated the conda environment using the instrutions on the front page. If I disable unsloth, llamafactory works.
My best guess is that this is due to not being able to fit the entire model on one GPU for training (I have extended the vocabulary, so I have to fine-tune the embedding layers, not just a standard LoRA or even qLoRA)? I used deepspeed without unsloth on a first data subset, but I would expect unsloth to be much faster, and would like to use it.
The text was updated successfully, but these errors were encountered: