[Bug] internlm docker image issue #182

marks221b · 2024-04-08T02:24:43Z

Describe the bug

The container version of internlm has not been built successfully. After pulling the image and entering the container, it was found that it is not possible to directly use the container for training and inference. The environment inside the container also differs significantly from the actual runtime environment needed.

Environment

The container version of internlm has not been built successfully. After pulling the image and entering the container, it was found that it is not possible to directly use the container for training and inference. The environment inside the container also differs significantly from the actual runtime environment needed.

Other information

No response

sunpengsdu · 2024-04-08T02:28:16Z

please update the docker file with lasted requirements. @li126com

li126com · 2024-04-12T06:40:45Z

New docker images are ready at https://hub.docker.com/r/internlm/internevo/tags. The relative docs in the InternEvo have been updated.

* feat(XXX): add moe * reformat code * modified: .pre-commit-config.yaml modified: internlm/model/moe.py modified: internlm/model/modeling_internlm.py * modified: internlm/model/modeling_internlm.py * modified: internlm/core/context/process_group_initializer.py modified: internlm/core/scheduler/no_pipeline_scheduler.py modified: internlm/solver/optimizer/hybrid_zero_optim.py * modified: internlm/model/moe.py modified: internlm/moe/sharded_moe.py modified: internlm/utils/parallel.py * rollback .pre-commit-config.yaml * add residual and other moe features * modify grad clipping due to moe * add param arguments * reformat code * add expert data support and fix bugs * Update .pre-commit-config.yaml * modified: internlm/model/modeling_internlm.py * add no-interleaved & no-overlapped moe pp support * support zero_overlap_communication * avoid moe parameter partition in zero optimizer * fix the moe_loss_coeff bug * suppport interleaved pp * fix moe bugs in zero optimizer * fix more moe bugs in zero optimizer * fix moe bugs in zero optimizer * add logger for moe_loss * fix bugs with merge * fix the pp moe bugs * fix bug on logger * update moe training cfg on real-dataset * refactor code * refactor code * fix bugs with compute moe norm * optimize code with moe norm computing * fix the bug that missing scale the latent moe loss * refactor code * fix moe loss logger for the interleaved pp * change the scale position for latent moe_loss * Update 7B_sft.py * add support for moe checkpoint * add comments for moe * reformat code * fix bugs * fix bugs * Update .pre-commit-config.yaml * remove moe_loss_coeff parameter passing * fix group_norms computing in hybrid_zero_optim * use dummy mode to generate random numbers in model construction * replace flashatten experts by feedforward experts * fix bugs with _compute_norm_with_moe_group * merge upstream/develop into feature_add_moe * merge upstream/develop into feature_add_moe * change float16 to bfloat16 * fix interface for dense pipeline * refactor split_moe_group code * fix precision inconsistency * refactor code * Update 7B_sft.py * refactor code * refactor code * refactor code * refactor code * refactor code for split group * refactor code for log * fix logger for moe * refactor code for split param group * fix the moe_loss for ci and val * refactor * fix bugs with split group * fix bugs in save/load moe checkpoint * add moe module to `__init__.py` * add compatible code for old version * update moe config file * modify moe config file * fix merge bugs * update moe config file * change condition for compatibility --------- Co-authored-by: zhanglei <ryancheung98@163.com> Co-authored-by: Ryan (张磊) <leizhang.real@gmail.com>

marks221b added the bug Something isn't working label Apr 8, 2024

mm-assistant bot assigned yhcc Apr 8, 2024

sunpengsdu assigned li126com Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] internlm docker image issue #182

[Bug] internlm docker image issue #182

marks221b commented Apr 8, 2024

sunpengsdu commented Apr 8, 2024

li126com commented Apr 12, 2024 •

edited

[Bug] internlm docker image issue #182

[Bug] internlm docker image issue #182

Comments

marks221b commented Apr 8, 2024

Describe the bug

Environment

Other information

sunpengsdu commented Apr 8, 2024

li126com commented Apr 12, 2024 • edited

li126com commented Apr 12, 2024 •

edited