Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Device] Catch WebGPU OOM error #402

Merged
merged 1 commit into from
May 21, 2024

Conversation

CharlieFRuan
Copy link
Contributor

Prior to this PR, when users createEngine() or call reload() with a model that is too large for the device, likely the device would keep generating, ignoring OOM issue and correctness. See #356 and #209.

This PR catches such error with device.lost.then(), depending on tvmjs to call device.destroy() upon detecting error in createBuffer() via apache/tvm#17005.

We have only observed createBuffer() errors and hence will only process such kind of errors for now. Besides, since most OOM errors occur in reload(), we make the error handling synchronous despite using .then() by throwing the error at the end of reload() if there is one.

@CharlieFRuan
Copy link
Contributor Author

Example of trying to allocate a KV cache with 900k context length (should be similar for trying to load a model that is too large):
Screenshot 2024-05-17 at 2 25 06 AM

@CharlieFRuan CharlieFRuan marked this pull request as draft May 17, 2024 09:42
@CharlieFRuan
Copy link
Contributor Author

Marked as a draft for now as it depends on apache/tvm#17005

@CharlieFRuan CharlieFRuan marked this pull request as ready for review May 21, 2024 20:25
@CharlieFRuan CharlieFRuan merged commit b762bf4 into mlc-ai:main May 21, 2024
CharlieFRuan added a commit that referenced this pull request May 21, 2024
### Changes
Main changes include:
- New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via:
  - #390
- Various `index.js` and `index.js.map` post-processings to resolve
frontend compatibility issues with `require()` and `perf_hoooks`
  - #397
  - #406
- Catch WebGPU OOM error upon `reload()` and `CreateEngine()`:
  - #402
- Service Worker support (in addition to Extension Service Worker):
  - #395
  - #400
  - #401

### WASM Version
v0_2_34 as no change is required.

### TVMjs
TVMjs compiled at
apache/tvm@a5862a5,
with only one change in `tvm/web`:
apache/tvm#17005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant