

Stable Audio Open
#15 in KI-Musikgenererenstability-ai · siet 2024-06-05 · 2× · tolest 30. Juni 2026
12
Momentum
Stable Audio Open 1.0 is an open-weights text-to-audio model by Stability AI with approximately 1.21 billion parameters, built on a latent diffusion architecture with DiT components and T5-based text conditioning. It generates variable-length stereo audio of up to 47 seconds at 44.1 kHz. The model was trained exclusively on Creative Commons-licensed audio data (Freesound and Free Music Archive) and is primarily intended for research, sound design, and non-commercial use. Vocal or speech generation is explicitly not supported by the model.
Momentum-Verloop
04.04.03.07.
Features
| Max Music Duration (Seconds) | 47 seconds (variable stereo audio at 44.1 kHz) |
| Supported Input Formats | Text input (text prompts in English) with optional time conditioning (seconds_start, seconds_total); audio variations and style transfer from audio samples also possible |
| Vocal/Singing Quality | Not supported – the model is unable to generate realistic vocals or intelligible speech/singing |