Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility issue with tensorflow>=2.15.1 on GPU #516

Open
chajath opened this issue Mar 13, 2024 · 0 comments
Open

Compatibility issue with tensorflow>=2.15.1 on GPU #516

chajath opened this issue Mar 13, 2024 · 0 comments

Comments

@chajath
Copy link
Collaborator

chajath commented Mar 13, 2024

Hi team,

I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound or near-100% video memory growth of one of the data loading process, leading to CUDA OOM and cascading failures. One quick workaround is to locally install lower tensorflow versions e.g.

pip install tensorflow==2.13.1 tensorflow-text==2.13.0

Also works:

pip install tensorflow==2.14.1 tensorflow-text==2.14.0

chajath added a commit to chajath/maxtext that referenced this issue Mar 13, 2024
chajath added a commit to chajath/maxtext that referenced this issue Mar 14, 2024
Workaround of google#516

Also pin other dependencies for mostly reproducible container build
chajath added a commit to chajath/maxtext that referenced this issue Mar 14, 2024
Workaround of google#516

Also pin other dependencies for mostly reproducible container build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant