Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add infrastructure for counting and reporting internal resources #5708

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

nical
Copy link
Contributor

@nical nical commented May 15, 2024

... for debugging purposes

Fixes #5546

The PR is a first step, let's discuss the approach before I add more on top of it.

This adds an InternalCounter type which is internally a relaxed atomic isize when the counters feature is enabled and compiles to nothing otherwise.

These internal counters can be modified from anywhere with an &self reference. The idea is that a user of wgpu-core (and wgpu?) could query these internal counters and log them or display them using some custom overlay mechanism.

My primary motivation for this is to help with investigating resource lifetime/leak issues in wgpu, but I think that it would also be useful for users of wgpu.

The third commit might look a bit intimidating: I'm counting all textures, buffers, views, etc. in all hal backends. That's a fair amount of repetition, but I ended taking this route because wgpu-core internally creates some resources that aren't exposed to the API (like staging buffers), that would have to be tracked manually and generally I have more confidence in these counters staying up to date if they are in hal.
That said I expect to add more counters in both core and hal for whatever information will aid debugging.

Next steps:

  • Properly expose these in wgpu-core and wgpu.
  • Add an API to generate memory reports that would be provided by the underlying gpu memory allocator (when there is one). I opened a PR in gpu-allocator Add a public API to expose allocation reports Traverse-Research/gpu-allocator#226 to expose the relevant information, I have to figure out if/why I have to do the same in the gpu-alloc crate that the vulkan backend uses.

@nical nical requested a review from a team as a code owner May 15, 2024 13:47
@nical nical force-pushed the counters2 branch 10 times, most recently from b15dc6f to 2f7fa47 Compare May 16, 2024 12:42
Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunch of nits, but looking good to me overall. I also see the need for doing the counting at the hal level, but I'm concerned on how to reconciliate this stuff with the existing HubReport.
What's happening in InternalCounters and HubReport is strongly overlapping from a users point of view even though the source of the information is quite different. Either we introduce clear distinction for what is what or (better imho) we merge these new counters into the existing hub report. Another interesting course of action would be to deprecate the hub report, but as long as we have hubs they are quite useful.

@@ -33,6 +33,7 @@ ignored = ["cfg_aliases"]

[features]
default = ["link"]
counters = ["wgt/counters"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features should be documented - this is the right place to note down the impact of this and what it gives to a user

This comment was marked as resolved.

@@ -7243,3 +7245,154 @@ pub enum DeviceLostReason {
/// will call the callback immediately, with this reason.
DeviceInvalid = 4,
}

/// An internal counter for debugging purposes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know wgpu-types at this point has a long tradition of only being a single monstrous lib.rs file, but this seems like a nice reason to add counters.rs to make things more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not having a monstrous lib.rs file. This has made some things simple, but also, editor operations on this file feel heavy.

wgpu-hal/src/metal/device.rs Show resolved Hide resolved
wgpu-hal/src/metal/device.rs Show resolved Hide resolved
ErichDonGubler

This comment was marked as resolved.

Comment on lines +7275 to +7286
/// Get the counter's value.
///
/// Always returns 0 if the `counters` feature is not enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I think it might be better to cfg-ify the body of these functions, and keep the docs. the same. This would also sidestep any problems if we accidentally let a function signature drift between feature selections.

Copy link
Member

@ErichDonGubler ErichDonGubler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wumpf's blockage is a superset of anything I could think of blocking on, so approving here.

@ErichDonGubler

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an API to expose core and hal internal debugging information
3 participants