Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example on saving experts to one model when using distributed training #178

Open
Luodian opened this issue Aug 7, 2022 · 2 comments
Open
Labels
duplicate This issue or pull request already exists

Comments

@Luodian
Copy link

Luodian commented Aug 7, 2022

Hi Thanks for providing such a wonderful codebase.

I have seen and used the save & load in MoE on multiple GPUs, now I can save them on different ranks. But is there away to convert them to one model?

Say, I trained a 8 experts MoE on 8 GPUs, and now I want to do next stage inference on 1 GPUs.

Will you consider provide an example on doing so? or could you provide some ideas on how to implement it myself.

@ghostplant ghostplant added the duplicate This issue or pull request already exists label Aug 8, 2022
@ghostplant
Copy link
Contributor

A dup request of #177. We are going to add some utility functions to help with this conversion.

@Luodian
Copy link
Author

Luodian commented Aug 8, 2022

thanks! I think it's worthy doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants