Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Ink-2

cartesia · v2 · seit 16. Juni 2026 · 6× · zuletzt 29. Juni 2026

Momentum

Cartesia Ink-2 is a streaming speech-to-text (STT) model purpose-built for real-time voice agents. It is based on a State Space Model (SSM) architecture rather than transformers, and claims the lowest Word Error Rate of any streaming STT model. The model features native turn detection (turn.start, turn.eager_end, turn.end) with no external VAD required, and uses semantic endpointing to assess turn completion by meaning rather than silence. Ink-2 was released alongside Sonic-3.5 on June 16, 2026, debuting at rank #1 on the Artificial Analysis streaming STT leaderboard. At launch it supports English only; multilingual support is announced as forthcoming.

Historique du momentum

04.04.03.07.

Fonctionnalités

Latency (ms)	Time to final transcript: 100 ms (0.1 s); sub-350 ms partial latency; turn.eager_end further reduces the gap between the last word and the first LLM response
Multilingualism (Dialects)	English only at launch; other languages require fallback to ink-whisper; multilingual support for Ink-2 explicitly announced as 'in progress'
On-Device Execution	VPC/on-premise deployment available for enterprise customers (mentioned as a decision criterion for Cartesia vs. alternatives)
Languages	English only (at launch); multilingual support announced as in development

Ink-2

Fonctionnalités

Preuves (6)

Subscribe free. Unsubscribe the second it sucks.