webgpu: Move errorscopes to WGPU thread #32304

sagudev · 2024-05-17T10:07:43Z

This should lower amount of messages being set to script (no need to send ErrorScopeId and removed useless WebGPUOps::Success), simplified error scope logic and it aligned us with spec (errors scopes should happen in device timeline = WGPU thread). Also includes GPUError stuff from #30504 (see dd78013).

https://sagudev.github.io/briefcase/WebGPUerrorScopes.html works now as expected.

Some test are now FAIL, because CTS does only check for outstanding errors in error scopes, but error scopes were empty because of bad impl.

./mach build -d does not report any errors
./mach test-tidy does not report any errors
These changes fix Error scopes should be handled in device timeline (wgpu thread) #32297
There are tests for these changes in CTS

sagudev · 2024-05-19T19:53:43Z

We got handful of FAIL instead of PASS: https://github.com/sagudev/servo/actions/runs/9148458722/job/25151468770, but I believe those where fake passes.

gterzian

Will review tomorrow...

gterzian

Overall it looks good to me, but I wonder if we need the WebGPURequest::DispatchError, because it seems to me that those errors are always dispatched from the device timeline.

Couple of other questions.

components/script/dom/gpudevice.rs

gterzian · 2024-05-21T20:06:02Z

components/script/dom/globalscope.rs

        self.gpu_devices
            .borrow()
            .get(&device)
            .expect("GPUDevice not found")
-            .handle_server_msg(scope, result);
+            .fire_uncaptured_error(error);


Are we sure the device is never dropped at this point?

This is how it was done before, but in WGPU we have devices hashmap, when device is removed from it we do not dispatch any error. (we also send CleanDevice to script thread so it gets removed from gpu_devices too). But currently device is lost is not impl properly in most parts of servo (but so it's not in firefox), so this is yet something I would leave for the future work.

components/script/dom/gpuqueue.rs

gterzian · 2024-05-21T20:15:20Z

components/webgpu/gpu_error.rs

+#[derive(Clone, Debug, Eq, Hash, PartialEq)]
+pub(crate) struct ErrorScope {
+    // we only store first error
+    pub errors: Option<Error>,


Should we not make this a list, and just return "any" when popped, with "any" being just the first one for now?

That would be per spec, but why would we store multiple errors if we only use one?

Because per the spec you should store them all but only use one anyway, the only difference is how that one is chosen. In the current implementation we could do something simple like always choosing the last or the first, and add a comment about this when the error scope is popped.

I will change to vec, but generally the first error is the right one, although spec is very vague about that:

Let error be any one of the items in scope.[[errors]]

last one can be from previous operation failing:

For any two errors E1 and E2 in the list, if E2 was caused by E1, E2 should not be the one selected.

gterzian · 2024-05-21T20:25:30Z

components/script/dom/gpudevice.rs

-    scope_stack: Vec<ErrorScopeMetadata>,
-    next_scope_id: ErrorScopeId,
-}
-
 #[dom_struct]
 pub struct GPUDevice {


Should we not keep some additional state about the device having been lost, to properly silence any errors(while still perhaps running a few last steps), see https://www.w3.org/TR/webgpu/#errors-and-debugging.

For example, I think a ongoing map async step on the device timeline can dispatch an error even after the device is lost, and that error should be silenced(while the map async should still complete, see https://www.w3.org/TR/webgpu/#lose-the-device)

There is a lot of work to be done in servo for proper WebGPU support, but this PR is limited only to error scopes, as device lost is currently very loosely implemented (there is lost method but not much else handling), but I think I will do lost device stuff next.

sagudev · 2024-05-22T05:03:27Z

Overall it looks good to me, but I wonder if we need the WebGPURequest::DispatchError, because it seems to me that those errors are always dispatched from the device timeline.

Couple of other questions.

For this PR I limited myself to only moving error scopes and respec (fresh implementation of latest spec) of GPUErrors, so WebGPURequest::DispatchError is a hacky way to keep old code working, but this would be removed in the future as I progressed with the respec in the future (in some parts we are completely out of spec, so full rewrite will be needed).

gterzian

LGTM, I would only make one change which is the one described at #32304 (comment)

Also, if we don't have them already, it would be good to open issues for subsequent work, like the device lost stuff.

sagudev added 4 commits May 17, 2024 07:02

Prepare errorscopes logic in wgpu_thread

8419947

remove scope_id from ipc

5cd9d31

new GPUErrors per spec

dd78013

remove cotent timeline error_scope

b169a95

sagudev added the A-content/webgpu The WebGPU implementation. label May 17, 2024

sagudev added 2 commits May 17, 2024 12:47

fixup poperrorscope types

7860a28

device_scope -> gpu_error and nice errors

d006f3e

sagudev force-pushed the gpu-error branch from f206a6a to d006f3e Compare May 17, 2024 13:00

sagudev mentioned this pull request May 17, 2024

Assert there are no uncaptured errors. gpuweb/cts#3753

Open

sagudev added 3 commits May 20, 2024 12:11

Handle errors detection more elegantly

9825a6e

good expectations

57299a7

new expectations

90456bb

sagudev force-pushed the gpu-error branch from 7d3dddf to 90456bb Compare May 20, 2024 10:12

sagudev marked this pull request as ready for review May 20, 2024 10:12

sagudev requested a review from gterzian May 20, 2024 10:12

gterzian reviewed May 20, 2024

View reviewed changes

gterzian reviewed May 21, 2024

View reviewed changes

gterzian approved these changes May 22, 2024

View reviewed changes

Make error_scope.errors Vec as per spec

c554ca6

sagudev mentioned this pull request May 22, 2024

[webgpu] Implement "lose the device" #32347

Open

sagudev added this pull request to the merge queue May 22, 2024

Merged via the queue into servo:main with commit 794110e May 22, 2024
9 checks passed

sagudev deleted the gpu-error branch May 22, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: Move errorscopes to WGPU thread #32304

webgpu: Move errorscopes to WGPU thread #32304

sagudev commented May 17, 2024 •

edited

sagudev commented May 19, 2024 •

edited

gterzian left a comment

gterzian left a comment

gterzian May 21, 2024

sagudev May 22, 2024

gterzian May 21, 2024

sagudev May 22, 2024

gterzian May 22, 2024 •

edited

sagudev May 22, 2024

gterzian May 21, 2024

sagudev May 22, 2024

sagudev commented May 22, 2024

gterzian left a comment

webgpu: Move errorscopes to WGPU thread #32304

webgpu: Move errorscopes to WGPU thread #32304

Conversation

sagudev commented May 17, 2024 • edited

sagudev commented May 19, 2024 • edited

gterzian left a comment

Choose a reason for hiding this comment

gterzian left a comment

Choose a reason for hiding this comment

gterzian May 21, 2024

Choose a reason for hiding this comment

sagudev May 22, 2024

Choose a reason for hiding this comment

gterzian May 21, 2024

Choose a reason for hiding this comment

sagudev May 22, 2024

Choose a reason for hiding this comment

gterzian May 22, 2024 • edited

Choose a reason for hiding this comment

sagudev May 22, 2024

Choose a reason for hiding this comment

gterzian May 21, 2024

Choose a reason for hiding this comment

sagudev May 22, 2024

Choose a reason for hiding this comment

sagudev commented May 22, 2024

gterzian left a comment

Choose a reason for hiding this comment

sagudev commented May 17, 2024 •

edited

sagudev commented May 19, 2024 •

edited

gterzian May 22, 2024 •

edited