We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
非常感谢您的工作! 我在使用代码进行sft时遇到了一个问题。在不使用moe的config时能够很好的运行,在使用moe的config文件后报错。 运行代码:
torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_MoE4_sft.py --launcher "torch"
报错信息:
Traceback (most recent call last): File "train.py", line 324, in <module> main(args) File "train.py", line 105, in main model = initialize_model() File "/root/wbq/internlm_moe/InternEvo/internlm/utils/timeout.py", line 102, in wrapper result = func(*args, **kwargs) File "/root/wbq/internlm_moe/InternEvo/internlm/train/pipeline.py", line 167, in initialize_model model = MODEL_INITIALIZER.get_module(module_name=gpc.config.model_type)(**(gpc.config.model)) File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 584, in build_model_with_moe_cfg return _build_generic_model_1d(num_layers=num_layers, num_chunks=num_chunks, **cfg) File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 482, in _build_generic_model_1d chunk = PackedFlashInternLm1D(**filter_kwargs(PackedFlashInternLm1D.__init__, kwargs)).to(device) File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 356, in __init__ [ File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 357, in <listcomp> PackedFlashBaseLayer1D( File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 94, in __init__ self.mixer = MHA( File "/root/wbq/internlm_moe/InternEvo/internlm/model/modules/multi_head_attention.py", line 364, in __init__ self.rotary_emb = RotaryEmbedding( File "/root/wbq/internlm_moe/InternEvo/internlm/model/modules/embedding.py", line 287, in __init__ self.inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2, device=device, dtype=torch.float32) / dim)) TypeError: arange() received an invalid combination of arguments - got (int, int, int, dtype=torch.dtype, device=device), but expected one of: * (Number end, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) * (Number start, Number end, *, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) * (Number start, Number end, Number step, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
torch==2.1.0+cu118 transformers<4.30.0 sentencepiece numpy tqdm psutil packaging pre-commit ninja gputil pytest packaging boto3 botocore torch-scatter pyecharts py-libnuma pynvml tensorboard
1、我只修改了./configs/7B_MoE4_sft.py中训练集和测试集的地址
The text was updated successfully, but these errors were encountered:
我来复现下
Sorry, something went wrong.
sunpengsdu
No branches or pull requests
描述该错误
非常感谢您的工作!
我在使用代码进行sft时遇到了一个问题。在不使用moe的config时能够很好的运行,在使用moe的config文件后报错。
运行代码:
报错信息:
环境信息
其他信息
1、我只修改了./configs/7B_MoE4_sft.py中训练集和测试集的地址
The text was updated successfully, but these errors were encountered: