Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webgpu: Move errorscopes to WGPU thread #32304

Merged
merged 10 commits into from
May 22, 2024
Merged

webgpu: Move errorscopes to WGPU thread #32304

merged 10 commits into from
May 22, 2024

Conversation

sagudev
Copy link
Member

@sagudev sagudev commented May 17, 2024

This should lower amount of messages being set to script (no need to send ErrorScopeId and removed useless WebGPUOps::Success), simplified error scope logic and it aligned us with spec (errors scopes should happen in device timeline = WGPU thread). Also includes GPUError stuff from #30504 (see dd78013).

https://sagudev.github.io/briefcase/WebGPUerrorScopes.html works now as expected.

Some test are now FAIL, because CTS does only check for outstanding errors in error scopes, but error scopes were empty because of bad impl.


@sagudev sagudev added the A-content/webgpu The WebGPU implementation. label May 17, 2024
@sagudev
Copy link
Member Author

sagudev commented May 19, 2024

We got handful of FAIL instead of PASS: https://github.com/sagudev/servo/actions/runs/9148458722/job/25151468770, but I believe those where fake passes.

@sagudev sagudev marked this pull request as ready for review May 20, 2024 10:12
@sagudev sagudev requested a review from gterzian May 20, 2024 10:12
Copy link
Member

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review tomorrow...

Copy link
Member

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me, but I wonder if we need the WebGPURequest::DispatchError, because it seems to me that those errors are always dispatched from the device timeline.

Couple of other questions.

components/script/dom/gpudevice.rs Show resolved Hide resolved
components/script/dom/gpudevice.rs Show resolved Hide resolved
self.gpu_devices
.borrow()
.get(&device)
.expect("GPUDevice not found")
.handle_server_msg(scope, result);
.fire_uncaptured_error(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the device is never dropped at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was done before, but in WGPU we have devices hashmap, when device is removed from it we do not dispatch any error. (we also send CleanDevice to script thread so it gets removed from gpu_devices too). But currently device is lost is not impl properly in most parts of servo (but so it's not in firefox), so this is yet something I would leave for the future work.

components/script/dom/gpuqueue.rs Show resolved Hide resolved
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub(crate) struct ErrorScope {
// we only store first error
pub errors: Option<Error>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not make this a list, and just return "any" when popped, with "any" being just the first one for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be per spec, but why would we store multiple errors if we only use one?

Copy link
Member

@gterzian gterzian May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because per the spec you should store them all but only use one anyway, the only difference is how that one is chosen. In the current implementation we could do something simple like always choosing the last or the first, and add a comment about this when the error scope is popped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to vec, but generally the first error is the right one, although spec is very vague about that:

Let error be any one of the items in scope.[[errors]]

last one can be from previous operation failing:

For any two errors E1 and E2 in the list, if E2 was caused by E1, E2 should not be the one selected.

scope_stack: Vec<ErrorScopeMetadata>,
next_scope_id: ErrorScopeId,
}

#[dom_struct]
pub struct GPUDevice {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not keep some additional state about the device having been lost, to properly silence any errors(while still perhaps running a few last steps), see https://www.w3.org/TR/webgpu/#errors-and-debugging.

For example, I think a ongoing map async step on the device timeline can dispatch an error even after the device is lost, and that error should be silenced(while the map async should still complete, see https://www.w3.org/TR/webgpu/#lose-the-device)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of work to be done in servo for proper WebGPU support, but this PR is limited only to error scopes, as device lost is currently very loosely implemented (there is lost method but not much else handling), but I think I will do lost device stuff next.

@sagudev
Copy link
Member Author

sagudev commented May 22, 2024

Overall it looks good to me, but I wonder if we need the WebGPURequest::DispatchError, because it seems to me that those errors are always dispatched from the device timeline.

Couple of other questions.

For this PR I limited myself to only moving error scopes and respec (fresh implementation of latest spec) of GPUErrors, so WebGPURequest::DispatchError is a hacky way to keep old code working, but this would be removed in the future as I progressed with the respec in the future (in some parts we are completely out of spec, so full rewrite will be needed).

Copy link
Member

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would only make one change which is the one described at #32304 (comment)

Also, if we don't have them already, it would be good to open issues for subsequent work, like the device lost stuff.

@sagudev sagudev added this pull request to the merge queue May 22, 2024
Merged via the queue into servo:main with commit 794110e May 22, 2024
9 checks passed
@sagudev sagudev deleted the gpu-error branch May 22, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/webgpu The WebGPU implementation.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants