

Deepseek R1 Distill Qwen
#15deepseek · seit 2025-01-20 · 3× · zuletzt 29. Juni 2026
25
Momentum
DeepSeek-R1-Distill-Qwen is a family of four open-weight dense reasoning models (1.5B, 7B, 14B, 32B parameters) created via knowledge distillation from the large DeepSeek-R1 model. The base models are Qwen2.5 variants; fine-tuning used 800,000 reasoning samples synthesized by DeepSeek-R1 via supervised fine-tuning only (no RL stage). The models outperform many larger open-source models on AIME 2024 and MATH-500 benchmarks, with the 32B variant achieving 72.6% Pass@1 on AIME 2024 according to official documentation. All weights are publicly available on Hugging Face under the Apache 2.0 license.
Historique du momentum
04.04.03.07.
Fonctionnalités
| Context Window (Tokens) | 128,000 tokens (all Qwen variants: 1.5B, 7B, 14B, 32B); maximum generation length 32,768 tokens |
| Cost Efficiency (€/1M Tokens) | R1-Distill-Qwen-32B: $0.30 input / $0.30 output per 1M tokens (third-party hosting, lowest available price); smaller variants (1.5B) currently not listed with a commercial API provider |
| Parameter Size (Billions) | Model family with 4 sizes: 1.5B / 7B / 14B / 32B parameters (all based on Qwen2.5) |
| Reasoning Capability (AIME Score %) | 1.5B: 28.9% Pass@1 | 7B: 55.5% Pass@1 | 14B: 69.7% Pass@1 | 32B: 72.6% Pass@1 (AIME 2024) |
| Availability Status | Open weights – all four variants (1.5B, 7B, 14B, 32B) publicly available on Hugging Face; Apache 2.0 license, commercially usable; 32B additionally available via API providers (e.g., Groq) |