-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having DPRINTs in kernels affects perf even when not enabled #8575
Comments
@tt-dma I think we looked at this before, I thought there were some remnants when dprint was disabled but didn't think they'd affect perf. Do you recall? We need ot make dprint go away 100% when disabled |
Due to the syntax it's not trivial to |
@tt-aho I played around with the test you linked, strangely enough I get worse perf when I commented dprint statements out of
|
This seems to make perf worse for me. I get ~23.23us after this change.
|
I saw a bunch of empty Let me look into it some more to see if there's a good way to get it out of there without blowing up the syntax |
Wonder if we should change our dprint implementation to use variadic macros calling variadic templated fns, eg: |
@tt-aho Can you try this patch? It gets the same objdump/perf on my machine compared to commenting out the prints:
Note that when I say commenting out the prints, I also mean the prints in |
@pgkeller Thoughts on that fix if it works? Looks a bit hacky but I do feel there's value to keeping the printing exactly the same as C++ ostreams. |
I do see perf improvement from this change but not as much as commenting it out originally. Thoug after this change, then commenting out dprints in cq_dispatch doesnt make a difference? Before change: After change: |
Hey @tt-aho , I did some additional testing, and I think the additional perturbation we're seeing with my change is due to the dprints in Before change: After change: On my machine, compiling out the dprints actually slows things down? But the change makes things consistent, and I don't see any differences in the objdump between commenting out and disabling, so I think this is the way to go. Let me know if you see the same thing when commenting out DPRINTS in cq_prefetch, cq_dispatch, noc_sanitize without the change. |
Going to merge the PR I have with the fix, let me know if we need to revert for any reason |
I know I tried commenting out all the dprints from the fd2 kernels before (not in noc_sanitize) and didn't sees any difference compared to just cq_dispatch. I can retry with commenting out from noc_sanitize as well but I think with the merge this issue is probably resolved since it makes sure dprint does not perturb perf when not enabled. Thanks! |
Ok sounds good, I'll close this - feel free to re-open if you see any differences going forwards |
Ex.
Running this test
./build/test/tt_metal/perf_microbenchmark/dispatch/test_pgm_dispatch -w 5000 -s 256 -x 7 -y 6 -a 128
I was seeing perf of ~22.8us without enabling dprint, but after commenting out the actual dprint statements incq_dispatch.cpp
, I get perf of ~21.57us.This implies that even without enabling dprints, it negatively impacts perf to have dprint statements in kernels, when ideally dprints should be a nop if not enabled and not impact perf.
The text was updated successfully, but these errors were encountered: