

MAI Transcribe 1.5
#10 in Transkription (STT)microsoft · v1.5 · siet 2026-06-02 · 2× · tolest 29. Juni 2026
15
Momentum
MAI Transcribe 1.5 is a speech recognition model by Microsoft in the Audio & Voice category. The product is claimed to be the best transcription model in the world and is distinguished by an unusual combination of speed and accuracy.
Momentum-Verloop
04.04.03.07.
Features
| Price Tier | $0.36 USD per hour of audio (Azure Speech / Microsoft Foundry); equivalent to $6 USD per 1,000 minutes |
| Language Support (Count) | 43 languages (FLEURS benchmark coverage); plus 100+ BCP-47 locales per Azure/OpenRouter documentation |
| Processing Speed (x Realtime) | ~276x realtime (fastest model in the top 10 by accuracy; 1 hour of audio in under 15 seconds; up to 5.7x faster than predecessor MAI-Transcribe-1) |
| Word Error Rate (%) | 2.4% AA-WER (Artificial Analysis Leaderboard, rank #3); 3.7% WER on FLEURS (25 core languages, rank #1); 4.9% avg. WER on FLEURS across 43 languages |