

Cartesia
#13 v Syntéza řeči (TTS)cartesia · od Erstes Sonic-TTS-Modell veröffentlicht im Mai 2024; aktuelles Modell Sonic-3.5 veröffentlicht am 16. Juni 2026 · 11× · naposledy 30. 6. 2026
Cartesia is an AI text-to-speech product from the eponymous startup, built on proprietary state-space models (SSMs) rather than classic transformer architectures. Its current flagship model, Sonic-3.5, is delivered via a streaming API with very low latency (sub-90ms time-to-first-audio), supports 40+ languages, expressive delivery (including laughter), and instant voice cloning. The product is offered as an API/SDK, web playground, and as the foundation for its own voice-agent platform ("Line"), with tiered pricing from free to enterprise.
Vlastnosti
| Real-Time Streaming | Yes, streaming-first TTS API for real-time voice generation in voice agents |
| Latency | Sub-90ms time-to-first-audio (Sonic-3.5); some reports cite ~82ms or 100ms p90 |
| License | Commercial SaaS usage via paid plans; separate open-source project 'Edge' (Apache 2.0) for on-device SSMs |
| Platform | Cloud API, web playground, on-premises and on-device deployment (SDKs for developers) |
| Price | Free $0/mo (20K credits); Pro $5/mo; Startup $49/mo; Scale $299/mo; Enterprise on request |
| Release Date | Sonic (first version) May 2024; Sonic-3.5 released June 16, 2026 |
| Languages | 42 languages natively supported (incl. English, Hindi, Spanish, French, German, Japanese, Hebrew) |
| Voice Cloning | Instant voice cloning possible with just 3–10 seconds of audio; 'Pro Voice Cloning' also available |