microsoft/VibeVoice

↗ GitHub

Open-Source Frontier Voice AI

42,086

Stars

4,829

Forks

217

Watchers

134

Open Issues

Python·MIT License·Last commit Apr 24, 2026·by @microsoft·Published April 2, 2026·Analyzed 3mo ago

Safety Rating A

No hardcoded secrets, malicious code patterns, suspicious dependencies, or prompt injection attempts were detected. The repository is a well-documented Microsoft Research open-source project under MIT license. The README itself includes a responsible-use disclaimer and notes that the TTS code was removed after discovering misuse cases (deepfakes), demonstrating active responsible-AI governance. No red flags were identified in the repository content provided.

ℹAI-assisted review, not a professional security audit.

AI Analysis

VibeVoice is a family of open-source frontier voice AI models from Microsoft, encompassing Text-to-Speech (TTS), Automatic Speech Recognition (ASR), and real-time streaming TTS capabilities. The ASR model (7B) handles up to 60-minute long-form audio in a single pass with speaker diarization, timestamps, and customized hotword support across 50+ languages. The TTS model (1.5B) generates up to 90 minutes of expressive multi-speaker conversational audio. The streaming TTS model (0.5B) provides real-time synthesis with ~300ms first-audio latency. All models use continuous speech tokenizers at 7.5 Hz and a next-token diffusion framework built on top of a large language model backbone.

Use Cases

Long-form audio transcription with speaker diarization and timestamps
Podcast and multi-speaker dialogue synthesis
Real-time text-to-speech for voice interfaces and input methods
Multilingual speech recognition and synthesis (50+ languages)
Fine-tuning speech recognition models on custom domain data
Voice-powered input methods and accessibility tools