Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

重新加载模型后,GPU报错CUDA out of memory #121

Open
BillyChao opened this issue Sep 8, 2023 · 1 comment
Open

重新加载模型后,GPU报错CUDA out of memory #121

BillyChao opened this issue Sep 8, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@BillyChao
Copy link

  • 环境: GPU多卡,程序启动的时候,指定使用显存占用最少的显卡
# 指定显存占用最少的显卡
os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp')
memory_gpu = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
DEVICE_ID = np.argmax(memory_gpu)
torch.cuda.set_device(int(DEVICE_ID))
  • 程序启动后,默认加载ChatGLM-6B-int4,且可以成功加载,此时显示device=3
image
  • 选择ChatGLM-6B-int8 重新加载模型后,报错,此时显卡使用如下:
image image

具体错误为:
CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 31.75 GiB total capacity; 4.25 GiB already allocated; 44.75 MiB free; 4.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

  • 问题:
  1. 重新加载模型后,老模型占用的资源没有释放
  2. 新模型没有在device=3 的卡上加载,而用了默认的设备0
@thomas-yanxin thomas-yanxin added the bug Something isn't working label Jan 8, 2024
@123456ADWAE2
Copy link
Contributor

试一下这个#146 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants