微调internvl-v1.5报错KeyError: 'input_ids' #951

sunzx8 · 2024-05-17T07:50:53Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

运行指令
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset /dev/shm/shawn/data/ftoy.jsonl --sft_type full

数据格式为
{"query": "输出图片内容的markdown内容，如果有表格，则输出为html格式", "response": "```markdown\nAdaptive Quotient Filters\n\nConference '17, July 2017, Washington, DC, USA\n\n[34] Russell Housley, Warwick Ford, William Polk, and David Solo. 1999. Internet X.509 public key infrastructure certificate and CRL profile. Technical Report. M. Frans Kaashoek. 2002. The case for application-specific protocols. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP).", "images": ["/dev/shm/shawn/data/input/2405.10253v1/2405.10253v1-p16.png"]}

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
8*L20

Additional context
Add any other context about the problem here(在这里补充其他信息)

sunzx8 · 2024-05-17T08:46:21Z

我查了一下这个batch返回的是图片的两个元素，没有input_ids

请问这是什么原因？

hjh0119 · 2024-05-20T01:57:00Z

八卡device map可能会有问题，试下2/4卡

sunzx8 · 2024-05-20T01:59:02Z

您好，我这里测出来是max_length的问题，请问为什么我设置max_length从2048到4096过后就会报错
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

sunzx8 · 2024-05-20T02:16:53Z

还有我想问一下如果需要16卡两台机器一起微调需要怎么设置？

hjh0119 · 2024-05-20T02:19:44Z

CUDA报错，可能是OOM或者CUDA环境问题

多机多卡readme里有样例

sunzx8 · 2024-05-20T05:59:32Z

CUDA报错，可能是OOM或者CUDA环境问题

多机多卡readme里有样例

还有个问题，我发现用您给的lora微调方式虽然param显示只训练了很少的参数，但是显存消耗和全参数一模一样，请问这是不是实际没有转换过来？

实际消耗显存和全参数微调coco-mini的一样是241gb

sunzx8 · 2024-05-20T06:06:01Z

CUDA报错，可能是OOM或者CUDA环境问题
多机多卡readme里有样例

还有个问题，我发现用您给的lora微调方式虽然param显示只训练了很少的参数，但是显存消耗和全参数一模一样，请问这是不是实际没有转换过来？
实际消耗显存和全参数微调coco-mini的一样是241gb

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset coco-mini-en-2 --sft_type lora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

微调internvl-v1.5报错KeyError: 'input_ids' #951

微调internvl-v1.5报错KeyError: 'input_ids' #951

sunzx8 commented May 17, 2024

sunzx8 commented May 17, 2024 •

edited

hjh0119 commented May 20, 2024

sunzx8 commented May 20, 2024

sunzx8 commented May 20, 2024

hjh0119 commented May 20, 2024

sunzx8 commented May 20, 2024

sunzx8 commented May 20, 2024 •

edited

微调internvl-v1.5报错KeyError: 'input_ids' #951

微调internvl-v1.5报错KeyError: 'input_ids' #951

Comments

sunzx8 commented May 17, 2024

sunzx8 commented May 17, 2024 • edited

hjh0119 commented May 20, 2024

sunzx8 commented May 20, 2024

sunzx8 commented May 20, 2024

hjh0119 commented May 20, 2024

sunzx8 commented May 20, 2024

sunzx8 commented May 20, 2024 • edited

sunzx8 commented May 17, 2024 •

edited

sunzx8 commented May 20, 2024 •

edited