

MAI
#27 in Multimodal Modelsmicrosoft · since 2. Juni 2026 (Microsoft Build 2026); erste MAI-Modelle (MAI-Voice-1, MAI-1-preview) bereits im August 2025 gestartet · 3× · last seen Jun 29, 2026
15
Momentum
MAI is a family of seven new language models from Microsoft, covering reasoning, coding, image and voice processing, and transcription capabilities. Developers can tune the model weights themselves.
Momentum trend
04.04.03.07.
Features
| Multimodality | Text, image, speech, and transcription |
| Key Benchmark (%) | MAI-Thinking-1: 53% SWE-Bench Pro, 97% AIME 2025 (matches Claude Opus 4.6); MAI-Code-1-Flash: 51.2% SWE-Bench Pro |
| Context Window (Tokens) | 256,000 tokens (MAI-Thinking-1, 35B active parameters, MoE, ~1T total parameters) |
| License | Proprietary/commercial, access via Microsoft Foundry (some in private preview); Frontier Tuning allows customers to fine-tune weights themselves |
| Multimodality | Text/reasoning, code, image generation & editing, speech synthesis (voice), speech-to-text (transcription) – 4 modalities across 7 models |
| Platform | Microsoft Foundry (Azure), also OpenRouter, Fireworks AI and Baseten; MAI-Code-1-Flash in GitHub Copilot & VS Code |
| Price per 1M Tokens | MAI-Image-2.5: $5 text input / $8 image input / $47 image output; MAI-Code-1-Flash: $0.75 input / $4.50 output; MAI-Voice-2: $22 per 1M characters |
| Release Date | June 2, 2026 (Build 2026: MAI-Thinking-1, MAI-Image-2.5, MAI-Voice-2, MAI-Transcribe-1.5, MAI-Code-1-Flash) |