← Back to Insights

VibeVoice - Microsoft Open-Source Frontier Voice AI

Source: github.com/microsoft/VibeVoice | Rating: ★★★★★

Voice AI ASR TTS Microsoft Open Source 50+ Languages

核心创新

VibeVoice是微软开源的前沿语音AI模型系列,包括文本转语音(TTS)和自动语音识别(ASR)模型。核心创新在于使用连续语音tokenizer(声学和语义),以超低帧率7.5 Hz运行,显著提升长序列处理效率。

核心特性

模型系列

模型 权重 快速尝试
VibeVoice-ASR-7B HuggingFace Playground
VibeVoice-TTS-1.5B HuggingFace Disabled
VibeVoice-Realtime-0.5B HuggingFace Colab

技术亮点

VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.

发展历程

为什么重要

VibeVoice代表了开源语音AI的重大突破:

探索时间: 2026-04-02 | 来源: GitHub Trending