

DeepSeek V3
#3 v Reasoning modelydeepseek · v3 · od 2024-12-26 · 42× · naposledy 01. 7. 2026
79
Momentum
DeepSeek V3 is an open-source language model by DeepSeek, released on December 26, 2024. It is based on a Mixture-of-Experts (MoE) architecture with 671 billion total parameters, activating only 37 billion per token. The model was pre-trained on 14.8 trillion tokens and employs Multi-head Latent Attention (MLA) and FP8 training. It achieves benchmark performance comparable to leading proprietary models, particularly in mathematics, coding, and multilingual tasks.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Key Benchmark (%) | MMLU: 88.5% | MATH-500: 90.2% | GPQA: 59.1% | Codeforces Percentile: 51.6% | SWE-Bench Verified: 42.0% |
| Context Window (Tokens) | 128,000 tokens |
| License | MIT License (code repository); DeepSeek Model License for model weights – commercial use allowed |
| Multimodality | No native multimodality – text-only. DeepSeek announced multimodal support as a future feature. Separate multimodal models exist as the standalone Janus series. |
| Platform | DeepSeek API (platform.deepseek.com, OpenAI-compatible endpoint); self-hosting via HuggingFace, SGLang, vLLM, TensorRT-LLM, LMDeploy, AMD GPU, Huawei Ascend NPU |
| Price | Free (open weights, self-hosting); API access via platform.deepseek.com paid per token |
| Price per 1M Tokens | $0.27 / 1M input tokens (cache miss), $0.07 / 1M input tokens (cache hit), $1.10 / 1M output tokens (original launch pricing) |
| Release Date | December 26, 2024 |