You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a Trainer, which is a wrapper of low-level DataParallel, TensorParallel and PipelineParallel modules. The user just plugs in their model and dataloader and trains. Similar to transformers.
Use pipegoose's DistributedDataLoader in the Trainer.
DistributedDataLoader is just take a regular wrapper, add a distributed sampler to it like pipegoose's readme.
@isamu-isozaki Nope, I just checked Trainer from transformers. They modified our model's devices and stuff. We prefer implementing our own so we can incorporate distributed logging and callback in a specific rank, ParallelMode... and future changes. I just added some demo code (link).
Also one note, we only apply a specific parallel mode based on the parallel_context. For example, if data_parallel_size is greater than 1, then we wrap the model with DataParallel.
Notes
DataParallel
,TensorParallel
andPipelineParallel
modules. The user just plugs in their model and dataloader and trains. Similar totransformers
.APIs
Trainer
Trainer Callback
DistributedDataLoader
TODOs
The text was updated successfully, but these errors were encountered: