Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Miso One

#10 v Multimodální modely

miso · od 2026-06-03 · 9× · naposledy 29. 6. 2026

Momentum

Miso One (official model name: MisoTTS 8B) is a text-to-speech model released by Miso Labs on June 3, 2026, featuring 8 billion parameters and open weights under a modified MIT license. The model is based on a hierarchical RVQ Transformer architecture (7.7B-parameter backbone + 300M-parameter audio decoder) and accepts both text and optional audio context as input to generate expressive, tone-conditioned English speech. Miso Labs claims a time-to-first-byte latency of 110 ms on H100-class hardware (hosted API); local inference on consumer GPUs is materially slower according to the GitHub repository. One-shot voice cloning is supported from approximately 10 seconds of reference audio; generated audio is watermarked by default via SilentCipher.

Vývoj momenta

04.04.03.07.

Vlastnosti

Context Window (Tokens)	Maximum sequence length: 2,048 tokens (Mimi audio tokenizer, 32 audio codebooks à 2048-way; text vocabulary: 128,256 tokens)
Multimodal Inputs	Text + audio (optional audio context for one-shot voice cloning and voice continuation; output: Mimi audio codes / audio file). No video, no image. Currently English only.

Miso One

Vlastnosti

Zdroje (9)

Další produkty v této kategorii: Multimodální modely

Subscribe free. Unsubscribe the second it sucks.