Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed precision training in FP16 #14

Open
3 tasks
xrsrke opened this issue Oct 25, 2023 · 0 comments
Open
3 tasks

Mixed precision training in FP16 #14

xrsrke opened this issue Oct 25, 2023 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@xrsrke
Copy link
Owner

xrsrke commented Oct 25, 2023

TODOs

  • Come up with a design that does not modify the original modules like DataParallel, TensorParallel, PipelineParallel, ... in order to make them work in mixed precision training.
  • Make 3D parallelism work in mixed precision training.
  • Make DistributedOptimizer work in mixed precision training

APIs

import torch
import pipegoose

# other parallelism...
scaler = pipegoose.amp.GradScaler()

with pipegoose.amp.autocast(parallel_context, dtype=torch.float16):
    outputs = model(**inputs, labels=labels)
    loss = loss_func(outputs, targets)

scaled_loss = scaler.scale(loss)

optim.zero_grad()
scaled_loss.backward()
scaler.step(optimizer)
scaler.update() # updates the scale for next iteration

Reading List

  • MixedPrecisionOptimizer [link] and Float16OptimizerWithFloat16Params [link] from Megatron-LM
@xrsrke xrsrke added the help wanted Extra attention is needed label Oct 25, 2023
@xrsrke xrsrke changed the title Mixed precision training Mixed precision training in BF16 Oct 27, 2023
@xrsrke xrsrke changed the title Mixed precision training in BF16 Mixed precision training in FP16 Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: Todo
Development

No branches or pull requests

1 participant