Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half #238

Open
HwzGit opened this issue Feb 23, 2024 · 3 comments

Comments

@HwzGit
Copy link

HwzGit commented Feb 23, 2024

W&B offline. Running your script from this directory will only write metadata locally. Use wandb disabled to completely turn off W&B.
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[INFO|training_args.py:1345] 2024-02-23 15:32:48,053 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2024-02-23 15:32:48,053 >> PyTorch: setting up devices
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/training_args.py:1711: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of 🤗 Transformers. Use --hub_token instead.
warnings.warn(
/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
warnings.warn(
Using custom data configuration default-e3c6bc7f485aed74
Loading Dataset Infos from /data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2013] 2024-02-23 15:32:49,828 >> loading file tokenizer.json
[INFO|configuration_utils.py:713] 2024-02-23 15:32:49,937 >> loading configuration file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/config.json
[INFO|configuration_utils.py:775] 2024-02-23 15:32:49,939 >> Model config LlamaConfig {
"_name_or_path": "dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 16384,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 1000000,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.34.1",
"use_cache": true,
"vocab_size": 32016
}

[INFO|modeling_utils.py:2990] 2024-02-23 15:32:49,970 >> loading weights file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/model.safetensors.index.json
[INFO|modeling_utils.py:1220] 2024-02-23 15:32:49,971 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:770] 2024-02-23 15:32:49,971 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.10s/it]
[INFO|modeling_utils.py:3775] 2024-02-23 15:32:53,932 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:3783] 2024-02-23 15:32:53,932 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2024-02-23 15:32:53,936 >> loading configuration file dbgpt_hub/ft_local/codellama/CodeLlama-13b-Instruct-hf/snapshots/e9066d1322d2aba257d935c3e30e1ca483b84d1f/generation_config.json
[INFO|configuration_utils.py:770] 2024-02-23 15:32:53,936 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}

Running tokenizer on dataset: 0%| | 0/8659 [00:00<?, ? examples/s]Caching processed dataset at /data/home/zanehu/.cache/huggingface/datasets/json/default-e3c6bc7f485aed74/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-17ded6991318ebad.arrow
Running tokenizer on dataset: 100%|██████████████████████████████████████████████████████████████████████| 8659/8659 [00:42<00:00, 203.67 examples/s]
[INFO|training_args.py:1345] 2024-02-23 15:34:08,920 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2024-02-23 15:34:08,920 >> PyTorch: setting up devices
[INFO|trainer.py:1760] 2024-02-23 15:34:17,156 >> ***** Running training *****
[INFO|trainer.py:1761] 2024-02-23 15:34:17,156 >> Num examples = 8,659
[INFO|trainer.py:1762] 2024-02-23 15:34:17,156 >> Num Epochs = 8
[INFO|trainer.py:1763] 2024-02-23 15:34:17,156 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1765] 2024-02-23 15:34:17,156 >> Training with DataParallel so batch size has been adjusted to: 2
[INFO|trainer.py:1766] 2024-02-23 15:34:17,156 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1767] 2024-02-23 15:34:17,156 >> Gradient Accumulation steps = 16
[INFO|trainer.py:1768] 2024-02-23 15:34:17,156 >> Total optimization steps = 2,160
[INFO|trainer.py:1769] 2024-02-23 15:34:17,158 >> Number of trainable parameters = 52,428,800
[INFO|integration_utils.py:722] 2024-02-23 15:34:17,161 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Tracking run with wandb version 0.15.3
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/2160 [00:00<?, ?it/s]/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 172, in
train()
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 149, in train
run_sft(
File "/data/home/zanehu/hwz_local/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 102, in run_sft
train_result = trainer.train(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in training_step
loss = self.compute_loss(model, inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2801, in compute_loss
outputs = model(**inputs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
output.reraise()
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
output = module(*input, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
return self.base_model(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
outputs = run_function(*args)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 917, in custom_forward
return module(*inputs, past_key_value, output_attentions, padding_mask=padding_mask)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 635, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 349, in forward
query_states = self.q_proj(hidden_states)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/zanehu/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 817, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

@wangzaistone
Copy link
Member

@HwzGit 你显卡是怎样的型号的?初步看和显卡对数据精读的支持有关

@CanGuan
Copy link

CanGuan commented Apr 8, 2024

Tesla P40 显卡,不使用bf16也出现这样的异常

@mobguang
Copy link

mobguang commented May 7, 2024

@wangzaistone 请问这个问题有解吗,用NVIDIA L20的显卡,并且bf16为False,也是同样的error。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants