Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use tutel on Megatron Deepspeed #207

Open
wangyuxin87 opened this issue Jul 15, 2023 · 4 comments
Open

how to use tutel on Megatron Deepspeed #207

wangyuxin87 opened this issue Jul 15, 2023 · 4 comments

Comments

@wangyuxin87
Copy link

can tutel be used with Megatron Deepspeed?

@ghostplant
Copy link
Contributor

ghostplant commented Jul 17, 2023

Do you mean Megatron and Deepspeed respectively, or working together for them all?

@xcwanAndy
Copy link

@ghostplant Can tutel work concurrently with Megatron or Deepspeed respectively?

@ghostplant
Copy link
Contributor

Yes, Tutel is just an MoE layer implementation which is pluggable for any distributed frameworks. The way for other framework to use Tutel MoE layer is by passing distributed processing group properly, e.g.:

my_processing_group = deepspeed.new_group(..)

moe_layer = tutel_moe.moe_layer(
    ..,
    group=my_processing_group
)

If other frameworks are not available, Tutel itself also provides a 1-line initialization to generate groups you need, which works for both distributed gpu (i.e. nccl) and distributed cpu (i.e. gloo):

from tutel import system
parallel_env = system.init_data_model_parallel(backend='nccl' if args.device == 'cuda' else 'gloo')
my_processing_group = [ parallel_env.data_group | parallel_env.model_group | parallel_env.global_group ]
...

@xcwanAndy
Copy link

Thanks for your prompt response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants