Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Multimodal Content Support via SRT File Generation #2

Open
mysticaltech opened this issue Jan 22, 2024 · 3 comments
Open

Comments

@mysticaltech
Copy link

mysticaltech commented Jan 22, 2024

  • Current Scope: "aifs" excels in local semantic search for text-based content.
  • Proposed Enhancement: Extend capabilities to include audio and video content.
  • Implementation:
    • Generate .srt Files: Utilize tools like Whisper (runs great locally on most computers) for transcribing audio and video into subtitle files.
    • Search Integration: Incorporate these .srt files into the search functionality.
  • Outcome:
    • Search Utility: Enable searches to return specific segments from multimedia content.
    • Timestamps Feature: Provide timestamps alongside search results for precise referencing.
  • Impact: This enhancement would transform "aifs" into a comprehensive tool for searching across various content formats, greatly expanding its usability.
@mysticaltech
Copy link
Author

A good ressource to assist with this is: https://github.com/sindresorhus/awesome-whisper

@KillianLucas
Copy link
Collaborator

oh I love this @mysticaltech. will make a ROADMAP.md and put this there. thanks!

@mysticaltech
Copy link
Author

@KillianLucas Just stumbled on that high level whisper library, it seems ideal for the job. I checked in the code, the real API is more extensive than what is presented in the readme. https://github.com/kadirnar/whisper-plus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants