Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron style TFLOPs Calculation #537

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abhinavgoel95
Copy link
Contributor

@rwitten this is a draft.

This type of change would be specific to a few transformer models (e.g., Gemma, LLama, GPT, etc.). It wouldn't work with MoE, or some new architectures.

I was thinking that walking through the train-step and calculating the FLOPs layer-by-layer would be a very intrusive change.

What do you think?

MaxText/maxtext_utils.py Outdated Show resolved Hide resolved
MaxText/maxtext_utils.py Outdated Show resolved Hide resolved
MaxText/maxtext_utils.py Outdated Show resolved Hide resolved
@abhinavgoel95 abhinavgoel95 force-pushed the megatron_tflops branch 2 times, most recently from 3980d41 to 01c78bb Compare March 27, 2024 19:35
added config

adding support for megatron style tflops calculation

adding support for megatron style tflops calculation

adding support for megatron style tflops calculations
@abhinavgoel95
Copy link
Contributor Author

Made the changes as requested in the meeting @rwitten

@abhinavgoel95 abhinavgoel95 marked this pull request as ready for review April 1, 2024 20:00
@abhinavgoel95
Copy link
Contributor Author

cc @rwitten following up on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants