Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Deepseek R1 Distill Qwen

#15

deepseek · seit 2025-01-20 · 3× · zuletzt 29. Juni 2026

Momentum

DeepSeek-R1-Distill-Qwen is a family of four open-weight dense reasoning models (1.5B, 7B, 14B, 32B parameters) created via knowledge distillation from the large DeepSeek-R1 model. The base models are Qwen2.5 variants; fine-tuning used 800,000 reasoning samples synthesized by DeepSeek-R1 via supervised fine-tuning only (no RL stage). The models outperform many larger open-source models on AIME 2024 and MATH-500 benchmarks, with the 32B variant achieving 72.6% Pass@1 on AIME 2024 according to official documentation. All weights are publicly available on Hugging Face under the Apache 2.0 license.

Historique du momentum

04.04.03.07.

Fonctionnalités

Context Window (Tokens)	128,000 tokens (all Qwen variants: 1.5B, 7B, 14B, 32B); maximum generation length 32,768 tokens
Cost Efficiency (€/1M Tokens)	R1-Distill-Qwen-32B: $0.30 input / $0.30 output per 1M tokens (third-party hosting, lowest available price); smaller variants (1.5B) currently not listed with a commercial API provider
Parameter Size (Billions)	Model family with 4 sizes: 1.5B / 7B / 14B / 32B parameters (all based on Qwen2.5)
Reasoning Capability (AIME Score %)	1.5B: 28.9% Pass@1 \| 7B: 55.5% Pass@1 \| 14B: 69.7% Pass@1 \| 32B: 72.6% Pass@1 (AIME 2024)
Availability Status	Open weights – all four variants (1.5B, 7B, 14B, 32B) publicly available on Hugging Face; Apache 2.0 license, commercially usable; 32B additionally available via API providers (e.g., Groq)

Deepseek R1 Distill Qwen

Fonctionnalités

Preuves (3)

Subscribe free. Unsubscribe the second it sucks.