

Gemini 3.1 Flash TTS
#9 v Syntéza řeči (TTS)google · v3.1 · flash tts · od April 15, 2026 · 25× · naposledy 30. 6. 2026
34
Momentum
Gemini 3.1 Flash TTS is Google's text-to-speech model for converting text into high-quality speech across 70+ languages. The model provides 200+ audio tags for fine-grained control over vocal style, pace, and emotion, plus support for up to two speakers per request. With an Elo score of 1,211 on the Artificial Analysis leaderboard, it offers an optimal combination of speech quality and low cost. All outputs are watermarked with SynthID to identify AI-generated content.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Real-Time Streaming | Yes – streaming support (stream: true / streamGenerateContent) for gemini-3.1-flash-tts-preview; the only TTS model in the API with streaming support |
| Latency | Low-latency focus per official documentation; independent tests report approx. 300–500 ms to first audio chunk |
| Platform | Google AI Studio, Vertex AI (Public Preview), Gemini API (REST/SDK); cloud-only – no local deployment |
| Price | $1.00 / 1M input tokens (text); $20.00 / 1M output tokens (audio); audio tokens equal 25 tokens/second. Free tier (preview) available in AI Studio. Batch API: 50% discount. |
| Release Date | April 15, 2026 (preview launch, gemini-3.1-flash-tts-preview) |
| Languages | 70+ languages (including English, Spanish, French, German, Portuguese, Japanese, Korean, Arabic, Hindi, Mandarin, and numerous regional variants) |
| Voice Cloning | No – the model works exclusively with 30 predefined voices; no custom voice cloning available |