Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mamba end-to-end perf hanging in CI pipelines #8606

Open
esmalTT opened this issue May 17, 2024 · 7 comments
Open

Mamba end-to-end perf hanging in CI pipelines #8606

esmalTT opened this issue May 17, 2024 · 7 comments
Labels
bug Something isn't working mamba

Comments

@esmalTT
Copy link
Contributor

esmalTT commented May 17, 2024

Failed pipelines:

@esmalTT esmalTT added bug Something isn't working mamba labels May 17, 2024
@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 6, 2024

Is there update to this? Are we just running it manually to get perf numbers?

@esmalTT
Copy link
Contributor Author

esmalTT commented Jun 6, 2024

Is there update to this? Are we just running it manually to get perf numbers?

We were never able to replicate this issue locally. There was an FD hang that was fixed this week, it may have been related.

I would suggest we re-enable this for a few days to see if the issue is still occurring.

@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 6, 2024

Looks like it hung: https://github.com/tenstorrent/tt-metal/actions/runs/9407769341/job/25914428732

It seems that tt_cache_path is still set to None though unless I'm reading the code wrong... so it can't be a cache problem?

@esmalTT
Copy link
Contributor Author

esmalTT commented Jun 7, 2024 via email

@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 7, 2024

Screenshot 2024-06-07 at 8 38 38 AM

Maybe, but is this supposed to take over 15 mins on such a powerful BM like this?

tt-rkim added a commit that referenced this issue Jun 7, 2024
tt-rkim added a commit that referenced this issue Jun 7, 2024
…ic check failure bc I'm an idiot and forget to revert all the time
@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 7, 2024

bumped up to 40 min to see if we need more time https://github.com/tenstorrent/tt-metal/actions/runs/9419004292

@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 9, 2024

@esmalTT Looks like it took 4 mins on the Single-card BM.
Running a couple more times, but my initial guess is that it's ND hanging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mamba
Projects
None yet
Development

No branches or pull requests

2 participants