DeepSpeed MoE 问题 #64

BlackBearBiscuit · 2024-03-18T08:56:08Z

Describe the issue

Issue:
想请教一下是否在13B以上的MoE模型上实验过? 我使用了ZeRO-2，EP_SIZE=8;
在初始化optimizer状态时会报cuda: out of memory.
而ZeRO-3则不支持MoE, 由于设备限制，我也无法采用offload加载；
是不是还是得考虑megatron-deepspeed？

Environment:

GPU: 8×A100-80G

Deepspeed version:0.10.0
Torch version:
Transformers version:
Tokenizers version:

Command:

PASTE THE COMMANDS HERE.

Log:

PASTE THE LOGS HERE.

Screenshots:
You may attach screenshots if it better explains the issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed MoE 问题 #64

DeepSpeed MoE 问题 #64

BlackBearBiscuit commented Mar 18, 2024

DeepSpeed MoE 问题 #64

DeepSpeed MoE 问题 #64

Comments

BlackBearBiscuit commented Mar 18, 2024

Describe the issue