RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

HwzGit · 2024-02-23T08:01:04Z

W&B offline. Running your script from this directory will only write metadata locally. Use wandb disabled to completely turn off W&B.
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[INFO|training_args.py:1345] 2024-02-23 15:32:48,053 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2024-02-23 15:32:48,053 >> PyTorch: setting up devices
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/training_args.py:1711: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of 🤗 Transformers. Use --hub_token instead.
warnings.warn(
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
warnings.warn(
Using custom data configuration default-e3c6bc7f485aed74
Loading Dataset Infos from /data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer.json
[INFO|configuration_utils.py:713] 2024-02-23 15:32:49,937 >> loading configuration file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/config.json
[INFO|configuration_utils.py:775] 2024-02-23 15:32:49,939 >> Model config LlamaConfig {
"_name_or_path": "dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 16384,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 1000000,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.34.1",
"use_cache": true,
"vocab_size": 32016
}

[INFO|modeling_utils.py:2990] 2024-02-23 15:32:49,970 >> loading weights file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/model.safetensors.index.json
[INFO|modeling_utils.py:1220] 2024-02-23 15:32:49,971 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:770] 2024-02-23 15:32:49,971 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.10s/it]
[INFO|modeling_utils.py:3775] 2024-02-23 15:32:53,932 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:3783] 2024-02-23 15:32:53,932 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2024-02-23 15:32:53,936 >> loading configuration file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/generation_config.json
[INFO|configuration_utils.py:770] 2024-02-23 15:32:53,936 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}

Running tokenizer on dataset: 0%| | 0/8659 [00:00<?, ? examples/s]Caching processed dataset at /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-17ded6991318ebad.arrow
Running tokenizer on dataset: 100%|██████████████████████████████████████████████████████████████████████| 8659/8659 [00:42<00:00, 203.67 examples/s]
[INFO|training_args.py:1345] 2024-02-23 15:34:08,920 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2024-02-23 15:34:08,920 >> PyTorch: setting up devices
[INFO|trainer.py:1760] 2024-02-23 15:34:17,156 >> ***** Running training *****
[INFO|trainer.py:1761] 2024-02-23 15:34:17,156 >> Num examples = 8,659
[INFO|trainer.py:1762] 2024-02-23 15:34:17,156 >> Num Epochs = 8
[INFO|trainer.py:1763] 2024-02-23 15:34:17,156 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1765] 2024-02-23 15:34:17,156 >> Training with DataParallel so batch size has been adjusted to: 2
[INFO|trainer.py:1766] 2024-02-23 15:34:17,156 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1767] 2024-02-23 15:34:17,156 >> Gradient Accumulation steps = 16
[INFO|trainer.py:1768] 2024-02-23 15:34:17,156 >> Total optimization steps = 2,160
[INFO|trainer.py:1769] 2024-02-23 15:34:17,158 >> Number of trainable parameters = 52,428,800
[INFO|integration_utils.py:722] 2024-02-23 15:34:17,161 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Tracking run with wandb version 0.15.3
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/2160 [00:00<?, ?it/s]/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 172, in
train()
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 149, in train
run_sft(
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 102, in run_sft
train_result = trainer.train(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in training_step
loss = self.compute_loss(model, inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2801, in compute_loss
outputs = model(**inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
output.reraise()
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
output = module(*input, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
return self.base_model(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
outputs = run_function(*args)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 917, in custom_forward
return module(*inputs, past_key_value, output_attentions, padding_mask=padding_mask)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 635, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 349, in forward
query_states = self.q_proj(hidden_states)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 817, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

The text was updated successfully, but these errors were encountered:

wangzaistone · 2024-03-30T13:04:48Z

@HwzGit 你显卡是怎样的型号的？初步看和显卡对数据精读的支持有关

CanGuan · 2024-04-08T03:11:29Z

Tesla P40 显卡，不使用bf16也出现这样的异常

mobguang · 2024-05-07T11:56:52Z

@wangzaistone 请问这个问题有解吗，用NVIDIA L20的显卡，并且bf16为False，也是同样的error。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

HwzGit commented Feb 23, 2024

wangzaistone commented Mar 30, 2024

CanGuan commented Apr 8, 2024

mobguang commented May 7, 2024

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

Comments

HwzGit commented Feb 23, 2024

wangzaistone commented Mar 30, 2024

CanGuan commented Apr 8, 2024

mobguang commented May 7, 2024