Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Miso TTS

#16 in Text-to-Speech (TTS)

unknown · since 2026-06-03 · 2× · last seen Jun 29, 2026

Momentum

Miso TTS 8B is an open-weight text-to-speech model by Miso Labs with 8 billion parameters, released on June 3, 2026. It is based on a hierarchical RVQ Transformer architecture (inspired by Sesame CSM) comprising a 7.7B-parameter temporal backbone (Llama 3.2-style) and a 300M-parameter audio decoder. The model conditions speech generation on both text and optional audio input (conversation history), enabling one-shot voice cloning. Currently English-only; weights are available on Hugging Face under a modified MIT license.

Momentum trend

04.04.03.07.

Features

Real-Time Capability	~110 ms latency (time-to-first-byte on H100 hardware per vendor); local inference on consumer GPUs significantly slower
Model Size (Parameters)	~8.2B total (7.7B backbone + 300M audio decoder)
Price Tier	Open-weight / free to self-host (modified MIT license); API access announced but not yet available
Supported Languages	English (currently English only; v1)
Voice Cloning	One-shot voice cloning from ~10-second reference audio (optional, via audio context conditioning)

Miso TTS

Features

Sources (2)

More products in this category: Text-to-Speech (TTS)

Subscribe free. Unsubscribe the second it sucks.