Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internvl-chat-v1.5-int8 推理时报错,应该如何处理 #949

Closed
wlg-tt opened this issue May 17, 2024 · 3 comments
Closed

internvl-chat-v1.5-int8 推理时报错,应该如何处理 #949

wlg-tt opened this issue May 17, 2024 · 3 comments

Comments

@wlg-tt
Copy link

wlg-tt commented May 17, 2024

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
可以正常启动
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8 --model_id_or_path /home/tione/notebook/community/scan/InternVL-Chat-V1-5-int8/ --dtype bf16
但是推理时报错
image
internvl-chat-v1_5可以正常启动和推理
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5 --model_id_or_path /home/tione/notebook/community/scan/InternVL-Chat-V1-5/ --dtype bf16

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
torch2.1
py3.10
cuda12.1

Additional context
Add any other context about the problem here(在这里补充其他信息)

@hjh0119
Copy link
Collaborator

hjh0119 commented May 17, 2024

完整报错是什么?

@wlg-tt
Copy link
Author

wlg-tt commented May 17, 2024

这是完整的报错
INFO:swift] Please enter the conversation content first, followed by the path to the multimedia file.
<<< 请描述下面的图片
Input a media path or URL <<< /home/tione/notebook/community/scan/pic/pic/7.jpg
/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
cuBLAS API failed with status 15
A: torch.Size([5125, 3200]), B: torch.Size([9600, 3200]), C: (5125, 9600); (lda, ldb, ldc): (c_int(164000), c_int(307200), c_int(164000)); (m, n, k): (c_int(5125), c_int(9600), c_int(3200))
Exception in thread Thread-2 (generate):
error detectedTraceback (most recent call last):
File "/opt/conda/envs/py3102/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/envs/py3102/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/tione/notebook/community/swift/swift/llm/utils/model.py", line 2697, in _new_generate
return generate(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_internvl_chat.py", line 339, in generate
vit_embeds = self.extract_feature(pixel_values)
File "/home/tione/notebook/community/swift/swift/llm/utils/model.py", line 2707, in _new_extract_feature
return extract_feature(pixel_values).to(pixel_values.device).to(pixel_values.dtype)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_internvl_chat.py", line 211, in extract_feature
vit_embeds = self.vision_model(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 411, in forward
encoder_outputs = self.encoder(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 347, in forward
layer_outputs = encoder_layer(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 289, in forward
hidden_states = hidden_states + self.drop_path1(self.attn(self.norm1(hidden_states)) * self.ls1)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 246, in forward
x = self._naive_attn(hidden_states) if not self.use_flash_attn else self._flash_attn(hidden_states)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 211, in _naive_attn
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 797, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 556, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 395, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2337, in igemmlt
raise Exception("cublasLt ran into an error!")
Exception: cublasLt ran into an error!

@hjh0119
Copy link
Collaborator

hjh0119 commented May 17, 2024

看起来好像是bnb库的问题 参考下bnb的issue? 比如TimDettmers/bitsandbytes#538 oobabooga/text-generation-webui#379

@hjh0119 hjh0119 closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants