internvl-chat-v1.5-int8 推理时报错，应该如何处理 #949

wlg-tt · 2024-05-17T06:22:05Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
可以正常启动
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8 --model_id_or_path /home/tione/notebook/community/scan/InternVL-Chat-V1-5-int8/ --dtype bf16
但是推理时报错

internvl-chat-v1_5可以正常启动和推理
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5 --model_id_or_path /home/tione/notebook/community/scan/InternVL-Chat-V1-5/ --dtype bf16

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
torch2.1
py3.10
cuda12.1

Additional context
Add any other context about the problem here(在这里补充其他信息)

hjh0119 · 2024-05-17T06:34:59Z

完整报错是什么？

wlg-tt · 2024-05-17T06:36:43Z

这是完整的报错
INFO:swift] Please enter the conversation content first, followed by the path to the multimedia file.
<<< 请描述下面的图片
Input a media path or URL <<< /home/tione/notebook/community/scan/pic/pic/7.jpg
/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
cuBLAS API failed with status 15
A: torch.Size([5125, 3200]), B: torch.Size([9600, 3200]), C: (5125, 9600); (lda, ldb, ldc): (c_int(164000), c_int(307200), c_int(164000)); (m, n, k): (c_int(5125), c_int(9600), c_int(3200))
Exception in thread Thread-2 (generate):
error detectedTraceback (most recent call last):
File "/opt/conda/envs/py3102/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/envs/py3102/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/tione/notebook/community/swift/swift/llm/utils/model.py", line 2697, in _new_generate
return generate(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_internvl_chat.py", line 339, in generate
vit_embeds = self.extract_feature(pixel_values)
File "/home/tione/notebook/community/swift/swift/llm/utils/model.py", line 2707, in _new_extract_feature
return extract_feature(pixel_values).to(pixel_values.device).to(pixel_values.dtype)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_internvl_chat.py", line 211, in extract_feature
vit_embeds = self.vision_model(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 411, in forward
encoder_outputs = self.encoder(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 347, in forward
layer_outputs = encoder_layer(
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 289, in forward
hidden_states = hidden_states + self.drop_path1(self.attn(self.norm1(hidden_states)) * self.ls1)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 246, in forward
x = self._naive_attn(hidden_states) if not self.use_flash_attn else self._flash_attn(hidden_states)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5-int8/modeling_intern_vit.py", line 211, in _naive_attn
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 797, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 556, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 395, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/opt/conda/envs/py3102/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2337, in igemmlt
raise Exception("cublasLt ran into an error!")
Exception: cublasLt ran into an error!

hjh0119 · 2024-05-17T06:49:13Z

看起来好像是bnb库的问题参考下bnb的issue? 比如TimDettmers/bitsandbytes#538 oobabooga/text-generation-webui#379

hjh0119 closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internvl-chat-v1.5-int8 推理时报错，应该如何处理 #949

internvl-chat-v1.5-int8 推理时报错，应该如何处理 #949

wlg-tt commented May 17, 2024

hjh0119 commented May 17, 2024

wlg-tt commented May 17, 2024 •

edited

hjh0119 commented May 17, 2024

internvl-chat-v1.5-int8 推理时报错，应该如何处理 #949

internvl-chat-v1.5-int8 推理时报错，应该如何处理 #949

Comments

wlg-tt commented May 17, 2024

hjh0119 commented May 17, 2024

wlg-tt commented May 17, 2024 • edited

hjh0119 commented May 17, 2024

wlg-tt commented May 17, 2024 •

edited