Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xpu related device setting #446

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhuhong61
Copy link

@zhuhong61 zhuhong61 commented Apr 19, 2024

What does this PR do?

This PR update some xpu related logic for correct support.

Fixes # (issue)

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@zhuhong61
Copy link
Author

@abhilash1910 Could you please help review? Thanks!

Copy link
Contributor

@abhilash1910 abhilash1910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this update !
@HamidShojanazeri could you help take a look ?
cc @gujinghui

@@ -55,13 +58,15 @@ def train(model, train_dataloader,eval_dataloader, tokenizer, optimizer, lr_sche
if train_config.use_fp16 and train_config.enable_fsdp:
scaler = ShardedGradScaler()
elif train_config.use_fp16 and not train_config.enable_fsdp:
scaler = torch.cuda.amp.GradScaler()
scaler = torch.xpu.amp.GradScaler() if is_xpu_available() else torch.cuda.amp.GradScaler()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have torch.xpu.amp.GradScaler() already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also had a doubt regarding this, not sure we have xpu.amp.GradScaler() .

Copy link
Author

@zhuhong61 zhuhong61 Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have torch.xpu.amp.GradScaler() already?

Thanks @gujinghui , @abhilash1910 , yes, we don't support torch.xpu.amp.GradScaler(), I will remove it and update it once we support this API. By the way, do we need a warning message or exit message to indicate torch.xpu.amp.GradScaler() is not supported on xpu?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be yes? Assert to stop the workload with graceful exit message?

@BrodysgotMs
Copy link

BrodysgotMs commented Apr 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants