New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update xpu related device setting #446
base: main
Are you sure you want to change the base?
Conversation
@abhilash1910 Could you please help review? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this update !
@HamidShojanazeri could you help take a look ?
cc @gujinghui
@@ -55,13 +58,15 @@ def train(model, train_dataloader,eval_dataloader, tokenizer, optimizer, lr_sche | |||
if train_config.use_fp16 and train_config.enable_fsdp: | |||
scaler = ShardedGradScaler() | |||
elif train_config.use_fp16 and not train_config.enable_fsdp: | |||
scaler = torch.cuda.amp.GradScaler() | |||
scaler = torch.xpu.amp.GradScaler() if is_xpu_available() else torch.cuda.amp.GradScaler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really have torch.xpu.amp.GradScaler() already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I also had a doubt regarding this, not sure we have xpu.amp.GradScaler() .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really have torch.xpu.amp.GradScaler() already?
Thanks @gujinghui , @abhilash1910 , yes, we don't support torch.xpu.amp.GradScaler(), I will remove it and update it once we support this API. By the way, do we need a warning message or exit message to indicate torch.xpu.amp.GradScaler() is not supported on xpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be yes? Assert to stop the workload with graceful exit message?
$BYE
…On Mon, Apr 22, 2024 at 8:53 PM Jinghui ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/llama_recipes/utils/train_utils.py
<#446 (comment)>
:
> @@ -55,13 +58,15 @@ def train(model, train_dataloader,eval_dataloader, tokenizer, optimizer, lr_sche
if train_config.use_fp16 and train_config.enable_fsdp:
scaler = ShardedGradScaler()
elif train_config.use_fp16 and not train_config.enable_fsdp:
- scaler = torch.cuda.amp.GradScaler()
+ scaler = torch.xpu.amp.GradScaler() if is_xpu_available() else torch.cuda.amp.GradScaler()
Should be yes? Assert to stop the workload with graceful exit message?
—
Reply to this email directly, view it on GitHub
<#446 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A65RSD6TFKZE5BLDRQSEZS3Y6XLL3AVCNFSM6AAAAABGPD7BGGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMJWGE4DMMBUHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
What does this PR do?
This PR update some xpu related logic for correct support.
Fixes # (issue)
Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!