Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

MAI-Voice-1

#21

microsoft · v1 · seit Erste Vorstellung: 28. August 2025 (Blogpost "Two in-house models"); breiterer Public-Preview-Launch in Microsoft Foundr · 14× · zuletzt 30. Juni 2026

Momentum

MAI-Voice-1 is Microsoft's first in-house text-to-speech model, developed by the Microsoft AI (MAI) team under Mustafa Suleyman. It generates highly expressive, natural-sounding speech and can produce 60 seconds of audio in under one second on a single GPU. The model supports voice cloning from just a few seconds of audio (Personal Voice feature, gated/approval-based), fine-grained per-turn emotion control via SSML, and long-form content generation with consistent speaker identity. It is available via Azure Speech / Microsoft Foundry in public preview and powers features such as Copilot Audio Expressions and Copilot Podcasts.

Historique du momentum

04.04.03.07.

Fonctionnalités

Real-Time Streaming	Supports both streaming and batch synthesis; 60 sec of audio in <1 sec on a single GPU
Latency	Sub-100 ms latency for interactive workloads via the Azure Speech SDK
License	Proprietary; Microsoft holds full licensing rights for commercial use; currently public preview with no SLA
Platform	Azure Speech, Microsoft Foundry, MAI Playground, Copilot (Audio Expressions, Podcasts)
Price	From $22 per 1M characters (Azure Speech / Foundry)
Release Date	Aug 28, 2025 (announcement); Apr 2, 2026 (public preview in Foundry)
Languages	Optimized for English (US); multilingual coverage only with successor MAI-Voice-2 (>10 languages)
Voice Cloning	Yes, via Personal Voice feature from a 10-second audio sample; requires approval (Responsible AI process)

MAI-Voice-1

Fonctionnalités

Preuves (14)

Subscribe free. Unsubscribe the second it sucks.