Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: probability tensor contains either inf, nan or element < 0 #364

Open
2 tasks done
dayL-W opened this issue Apr 16, 2024 · 3 comments
Open
2 tasks done

Comments

@dayL-W
Copy link

dayL-W commented Apr 16, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

使用V100的显卡,模型是INT4的,调用方式是魔搭写的demo,重要的包版本如下:
transformers 4.37.1
modelscope 1.13.3

推理的时候报这个错误
RuntimeError: probability tensor contains either inf, nan or element < 0

应该很多人都会有这个问题吧,我看历史的issue也没有解决

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

@aixiaoing
Copy link

请问这个问题解决了吗?

@aixiaoing
Copy link

我也遇到了这个问题,你可以看下你训练过程中是不是loss降为0了,如果是这样说明训练的有问题,导致模型推理时乱输出,我解决了loss为0的问题后没有遇到当前这个问题了,解决loss为0的方法是从v100更换到A100,原理个人猜测是V100不支持bf16,所以能支持bf16的应该都行

@ziyou-lu
Copy link

ziyou-lu commented Jun 6, 2024

有人解决吗,我也遇到了这个问题,t4卡单卡报Cannot copy out of meta tensor; no data!多卡报probability tensor contains either inf, nan or element < 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants