Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for multiple wakeword/vad models #6653

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

kahrendt
Copy link
Contributor

What does this implement/fix?

This is still a work in progress! I'd appreciate people testing this out and letting me know about any issues. It will be a breaking change due to changes in the yaml syntax for handling multiple wake word models.

This PR adds several features/performance improvements:

  • Multiple wake word models can run simultaneously
    • At most two of the current models can run simultaneously without issue
  • Adds support for running a Voice Activity Detection model to potentially reduce certain false accepts
    • The current VAD model can only run with 1 model at the same time. If you try to run VAD and two models all at once, accuracy will suffer greatly
  • Several memory improvements
    • Models are loaded and unloaded as mWW starts and stops to save memory when not actively running
    • All buffers (excluding the ring buffer) are freed when not actively running
    • Ring buffer size is reduced (if it filled up before, there was no chance of ever recovering and so 0.5 s of audio was dropped each time)
    • Makes the tensor arena's default allocated memory smaller. The exact space allocated can be set in the codegen stage.

Todo:

  • Update the manifest format
    • Add a field for the necessary tensor arena size (currently the values are hardcoded in __init__.py)
    • Add support for a VAD helper model along with its specific parameters
  • Allow users to only enable VAD instead of having to point to a specific manifest file (also requires uploading the VAD model to the appropriate default repository)
  • ?Possibly better handle the VAD code? I have added a new preprocessor directive to only compile the relevant code if it is enabled, but I'm not sure if this is the best way to handle it. Warning: Enabling/disabling a VAD model will require a full recompile when rebuilding, so if you have a slow computer, this may take awhile!
  • Update the documentation (see the example yaml at the end of this PR in the mean time)
  • Fix any bugs people encounter in testing

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other

Related issue or feature (if applicable): not applicable

Pull request in esphome-docs with documentation (if applicable): unfinished

Test Environment

  • ESP32
  • ESP32 IDF
  • ESP8266
  • RP2040
  • BK72xx
  • RTL87xx

Example entry for config.yaml:

# Example config.yaml for multiple models

micro_wake_word:
  on_wake_word_detected:
    - voice_assistant.start: 
        wake_word: !lambda return wake_word; 
  models:
    - model: okay_nabu
      sliding_window_average_size: 5
    - model: hey_jarvis
      probability_cutoff: 0.75
# Example config.yaml for VAD

micro_wake_word:
  on_wake_word_detected:
    - voice_assistant.start: 
        wake_word: !lambda return wake_word; 
  vad_model: 
    model: https://github.com/kahrendt/microWakeWord/releases/download/model/vad_model.json
    sliding_window_average_size: 2
    threshold:
      upper: 0.95
      lower: 0.5
  models:
    - model: alexa

Checklist:

  • The code change is tested and works locally.
  • Tests have been added to verify that the new code works (under tests/ folder).

If user exposed functionality or configuration variables are added/changed:

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.05%. Comparing base (4d8b5ed) to head (a4886ac).
Report is 494 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #6653      +/-   ##
==========================================
+ Coverage   53.70%   54.05%   +0.34%     
==========================================
  Files          50       50              
  Lines        9408     9554     +146     
  Branches     1654     1687      +33     
==========================================
+ Hits         5053     5164     +111     
- Misses       4056     4066      +10     
- Partials      299      324      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants