

Qwen3-TTS CustomVoice
#7 v Syntéza řeči (TTS)alibaba · v3 · tts customvoice · od 22. Januar 2026 · 15× · naposledy 30. 6. 2026
36
Momentum
Qwen3-TTS CustomVoice is a text-to-speech model developed by Alibaba's Qwen team, offering 9 preset premium voices with style control via natural-language instructions. It supports 10 major languages plus multiple dialectal voice profiles, built on the proprietary Qwen3-TTS-Tokenizer-12Hz for efficient, low-latency speech generation. It is part of the open-source Qwen3-TTS family (Apache 2.0 license), released on January 22, 2026, and is also accessible via the DashScope/Alibaba Cloud API.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Real-Time Streaming | Yes – dual-track streaming architecture, first audio packet after one character |
| Latency | End-to-end synthesis latency down to 97 ms (streaming) |
| License | Apache License 2.0 |
| Platform | GitHub, Hugging Face, ModelScope, DashScope/Alibaba Cloud API |
| Price | Open-source model free (Apache 2.0); cloud API approx. $0.013 per 1,000 characters |
| Release Date | January 22, 2026 (open-source release 0.6B/1.7B) |
| Languages | 10 languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian |
| Voice Cloning | Not part of CustomVoice (only in base model: 3-sec. voice cloning); CustomVoice offers 9 fixed premium voices |