Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
deepseek

Deepseek R1 Distill Qwen

#15

deepseek · seit 2025-01-20 · 3× · zuletzt 29. Juni 2026

25
Momentum

DeepSeek-R1-Distill-Qwen is a family of four open-weight dense reasoning models (1.5B, 7B, 14B, 32B parameters) created via knowledge distillation from the large DeepSeek-R1 model. The base models are Qwen2.5 variants; fine-tuning used 800,000 reasoning samples synthesized by DeepSeek-R1 via supervised fine-tuning only (no RL stage). The models outperform many larger open-source models on AIME 2024 and MATH-500 benchmarks, with the 32B variant achieving 72.6% Pass@1 on AIME 2024 according to official documentation. All weights are publicly available on Hugging Face under the Apache 2.0 license.

Historique du momentum
04.04.03.07.

Fonctionnalités

Context Window (Tokens)128,000 tokens (all Qwen variants: 1.5B, 7B, 14B, 32B); maximum generation length 32,768 tokens
Cost Efficiency (€/1M Tokens)R1-Distill-Qwen-32B: $0.30 input / $0.30 output per 1M tokens (third-party hosting, lowest available price); smaller variants (1.5B) currently not listed with a commercial API provider
Parameter Size (Billions)Model family with 4 sizes: 1.5B / 7B / 14B / 32B parameters (all based on Qwen2.5)
Reasoning Capability (AIME Score %)1.5B: 28.9% Pass@1 | 7B: 55.5% Pass@1 | 14B: 69.7% Pass@1 | 32B: 72.6% Pass@1 (AIME 2024)
Availability StatusOpen weights – all four variants (1.5B, 7B, 14B, 32B) publicly available on Hugging Face; Apache 2.0 license, commercially usable; 32B additionally available via API providers (e.g., Groq)

Preuves (3)

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.