Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
deepseek

Deepseek R1 Distill Qwen

#15 v Reasoning modely

deepseek · od 2025-01-20 · 3× · naposledy 29. 6. 2026

25
Momentum

DeepSeek-R1-Distill-Qwen is a family of four open-weight dense reasoning models (1.5B, 7B, 14B, 32B parameters) created via knowledge distillation from the large DeepSeek-R1 model. The base models are Qwen2.5 variants; fine-tuning used 800,000 reasoning samples synthesized by DeepSeek-R1 via supervised fine-tuning only (no RL stage). The models outperform many larger open-source models on AIME 2024 and MATH-500 benchmarks, with the 32B variant achieving 72.6% Pass@1 on AIME 2024 according to official documentation. All weights are publicly available on Hugging Face under the Apache 2.0 license.

Vývoj momenta
04.04.03.07.

Vlastnosti

Context Window (Tokens)128,000 tokens (all Qwen variants: 1.5B, 7B, 14B, 32B); maximum generation length 32,768 tokens
Cost Efficiency (€/1M Tokens)R1-Distill-Qwen-32B: $0.30 input / $0.30 output per 1M tokens (third-party hosting, lowest available price); smaller variants (1.5B) currently not listed with a commercial API provider
Parameter Size (Billions)Model family with 4 sizes: 1.5B / 7B / 14B / 32B parameters (all based on Qwen2.5)
Reasoning Capability (AIME Score %)1.5B: 28.9% Pass@1 | 7B: 55.5% Pass@1 | 14B: 69.7% Pass@1 | 32B: 72.6% Pass@1 (AIME 2024)
Availability StatusOpen weights – all four variants (1.5B, 7B, 14B, 32B) publicly available on Hugging Face; Apache 2.0 license, commercially usable; 32B additionally available via API providers (e.g., Groq)

Zdroje (3)

Další produkty v této kategorii: Reasoning modely

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.