Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Deepseek R1 Distill Qwen

#15 v Reasoning modely

deepseek · od 2025-01-20 · 3× · naposledy 29. 6. 2026

Momentum

DeepSeek-R1-Distill-Qwen is a family of four open-weight dense reasoning models (1.5B, 7B, 14B, 32B parameters) created via knowledge distillation from the large DeepSeek-R1 model. The base models are Qwen2.5 variants; fine-tuning used 800,000 reasoning samples synthesized by DeepSeek-R1 via supervised fine-tuning only (no RL stage). The models outperform many larger open-source models on AIME 2024 and MATH-500 benchmarks, with the 32B variant achieving 72.6% Pass@1 on AIME 2024 according to official documentation. All weights are publicly available on Hugging Face under the Apache 2.0 license.

Vývoj momenta

04.04.03.07.

Vlastnosti

Context Window (Tokens)	128,000 tokens (all Qwen variants: 1.5B, 7B, 14B, 32B); maximum generation length 32,768 tokens
Cost Efficiency (€/1M Tokens)	R1-Distill-Qwen-32B: $0.30 input / $0.30 output per 1M tokens (third-party hosting, lowest available price); smaller variants (1.5B) currently not listed with a commercial API provider
Parameter Size (Billions)	Model family with 4 sizes: 1.5B / 7B / 14B / 32B parameters (all based on Qwen2.5)
Reasoning Capability (AIME Score %)	1.5B: 28.9% Pass@1 \| 7B: 55.5% Pass@1 \| 14B: 69.7% Pass@1 \| 32B: 72.6% Pass@1 (AIME 2024)
Availability Status	Open weights – all four variants (1.5B, 7B, 14B, 32B) publicly available on Hugging Face; Apache 2.0 license, commercially usable; 32B additionally available via API providers (e.g., Groq)

Deepseek R1 Distill Qwen

Vlastnosti

Zdroje (3)

Další produkty v této kategorii: Reasoning modely

Subscribe free. Unsubscribe the second it sucks.