-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webgpu: Move errorscopes to WGPU thread #32304
Conversation
We got handful of FAIL instead of PASS: https://github.com/sagudev/servo/actions/runs/9148458722/job/25151468770, but I believe those where fake passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review tomorrow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me, but I wonder if we need the WebGPURequest::DispatchError
, because it seems to me that those errors are always dispatched from the device timeline.
Couple of other questions.
self.gpu_devices | ||
.borrow() | ||
.get(&device) | ||
.expect("GPUDevice not found") | ||
.handle_server_msg(scope, result); | ||
.fire_uncaptured_error(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure the device is never dropped at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it was done before, but in WGPU we have devices hashmap, when device is removed from it we do not dispatch any error. (we also send CleanDevice to script thread so it gets removed from gpu_devices too). But currently device is lost is not impl properly in most parts of servo (but so it's not in firefox), so this is yet something I would leave for the future work.
components/webgpu/gpu_error.rs
Outdated
#[derive(Clone, Debug, Eq, Hash, PartialEq)] | ||
pub(crate) struct ErrorScope { | ||
// we only store first error | ||
pub errors: Option<Error>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not make this a list, and just return "any" when popped, with "any" being just the first one for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be per spec, but why would we store multiple errors if we only use one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because per the spec you should store them all but only use one anyway, the only difference is how that one is chosen. In the current implementation we could do something simple like always choosing the last or the first, and add a comment about this when the error scope is popped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change to vec, but generally the first error is the right one, although spec is very vague about that:
Let error be any one of the items in scope.[[errors]]
last one can be from previous operation failing:
For any two errors E1 and E2 in the list, if E2 was caused by E1, E2 should not be the one selected.
scope_stack: Vec<ErrorScopeMetadata>, | ||
next_scope_id: ErrorScopeId, | ||
} | ||
|
||
#[dom_struct] | ||
pub struct GPUDevice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not keep some additional state about the device having been lost, to properly silence any errors(while still perhaps running a few last steps), see https://www.w3.org/TR/webgpu/#errors-and-debugging.
For example, I think a ongoing map async step on the device timeline can dispatch an error even after the device is lost, and that error should be silenced(while the map async should still complete, see https://www.w3.org/TR/webgpu/#lose-the-device)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of work to be done in servo for proper WebGPU support, but this PR is limited only to error scopes, as device lost is currently very loosely implemented (there is lost method but not much else handling), but I think I will do lost device stuff next.
For this PR I limited myself to only moving error scopes and respec (fresh implementation of latest spec) of GPUErrors, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I would only make one change which is the one described at #32304 (comment)
Also, if we don't have them already, it would be good to open issues for subsequent work, like the device lost stuff.
This should lower amount of messages being set to script (no need to send ErrorScopeId and removed useless WebGPUOps::Success), simplified error scope logic and it aligned us with spec (errors scopes should happen in device timeline = WGPU thread). Also includes GPUError stuff from #30504 (see dd78013).
https://sagudev.github.io/briefcase/WebGPUerrorScopes.html works now as expected.
Some test are now FAIL, because CTS does only check for outstanding errors in error scopes, but error scopes were empty because of bad impl.
./mach build -d
does not report any errors./mach test-tidy
does not report any errors