You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Come up with a design that does not modify the original modules like DataParallel, TensorParallel, PipelineParallel, ... in order to make them work in mixed precision training.
Make 3D parallelism work in mixed precision training.
Make DistributedOptimizer work in mixed precision training
APIs
import torch
import pipegoose
# other parallelism...
scaler = pipegoose.amp.GradScaler()
with pipegoose.amp.autocast(parallel_context, dtype=torch.float16):
outputs = model(**inputs, labels=labels)
loss = loss_func(outputs, targets)
scaled_loss = scaler.scale(loss)
optim.zero_grad()
scaled_loss.backward()
scaler.step(optimizer)
scaler.update() # updates the scale for next iteration
Reading List
MixedPrecisionOptimizer[link] and Float16OptimizerWithFloat16Params[link] from Megatron-LM
The text was updated successfully, but these errors were encountered:
TODOs
DataParallel
,TensorParallel
,PipelineParallel
, ... in order to make them work in mixed precision training.DistributedOptimizer
work in mixed precision trainingAPIs
Reading List
MixedPrecisionOptimizer
[link] andFloat16OptimizerWithFloat16Params
[link] from Megatron-LMThe text was updated successfully, but these errors were encountered: