Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
-
Updated
Jun 6, 2024 - Python
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
VITS-based Voice Conversion focused on simplicity, quality and performance.
ModelScope: bring the notion of Model-as-a-Service to life.
Foundational model for human-like, expressive TTS
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Deep learning based speech and pronunciation assessment API for 8 languages.
Drift-Lens: an Unsupervised Drift Detection Framework for Deep Learning Classifiers on Unstructured Data
A ggml (C++) re-implementation of tortoise-tts. Under construction and seeking contributors.
Data manipulation and transformation for audio signal processing, powered by PyTorch
Chinese Phonetic Dataset with Homophone Clustering
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Audio Codec Speech processing Universal PERformance Benchmark
Lingvo
Add a description, image, and links to the speech topic page so that developers can more easily learn about it.
To associate your repository with the speech topic, visit your repo's landing page and select "manage topics."