Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
microsoft

MAI-Voice-1

#21

microsoft · v1 · seit Erste Vorstellung: 28. August 2025 (Blogpost "Two in-house models"); breiterer Public-Preview-Launch in Microsoft Foundr · 14× · zuletzt 30. Juni 2026

10
Momentum

MAI-Voice-1 is Microsoft's first in-house text-to-speech model, developed by the Microsoft AI (MAI) team under Mustafa Suleyman. It generates highly expressive, natural-sounding speech and can produce 60 seconds of audio in under one second on a single GPU. The model supports voice cloning from just a few seconds of audio (Personal Voice feature, gated/approval-based), fine-grained per-turn emotion control via SSML, and long-form content generation with consistent speaker identity. It is available via Azure Speech / Microsoft Foundry in public preview and powers features such as Copilot Audio Expressions and Copilot Podcasts.

Historique du momentum
04.04.03.07.

Fonctionnalités

Real-Time StreamingSupports both streaming and batch synthesis; 60 sec of audio in <1 sec on a single GPU
LatencySub-100 ms latency for interactive workloads via the Azure Speech SDK
LicenseProprietary; Microsoft holds full licensing rights for commercial use; currently public preview with no SLA
PlatformAzure Speech, Microsoft Foundry, MAI Playground, Copilot (Audio Expressions, Podcasts)
PriceFrom $22 per 1M characters (Azure Speech / Foundry)
Release DateAug 28, 2025 (announcement); Apr 2, 2026 (public preview in Foundry)
LanguagesOptimized for English (US); multilingual coverage only with successor MAI-Voice-2 (>10 languages)
Voice CloningYes, via Personal Voice feature from a 10-second audio sample; requires approval (Responsible AI process)

Preuves (14)

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.