

DeepSeek-OCR
#22deepseek · seit 2025-10-20 · 2× · zuletzt 30. Juni 2026
DeepSeek-OCR is an open-source vision-language model by DeepSeek AI, released on October 20, 2025. It uses a technique called "Contexts Optical Compression" (COC), which compresses document pages into a small number of vision tokens rather than converting them into long text-token sequences. The architecture consists of the DeepEncoder (380 M parameters, combining SAM-Base and CLIP-Large with a 16× convolutional compressor) and the DeepSeek-3B-MoE decoder (3 B total parameters, ~570 M active per token). With vLLM inference on a single NVIDIA A100-40G it achieves approximately 2,500 tokens/second and up to 200,000 pages/day; model weights (~6.7 GB BF16) are available for free under the MIT license.
Fonctionnalités
| Latency (ms) | 100–400 ms per page on A100 GPU (simple documents ~100 ms, complex documents with tables/charts ~400 ms) |
| Model Size (Parameter Count) | 3B total parameters (DeepSeek-3B-MoE decoder: 3B total, ~570M active per token; DeepEncoder: ~380M parameters); weight file ~6.7 GB BF16 |
| Price Tier | Open source / free: MIT-licensed weights, free self-hosting with no API fees. Third-party API (e.g., DeepInfra): $0.03/M input tokens and $0.10/M output tokens. |
| Language Support (Count) | 100+ languages (trained on over 30M PDF pages in 100+ languages, incl. Latin, CJK, Cyrillic, and scientific scripts) |
| Processing Speed (x Realtime) | ~2,500 tokens/second for PDF processing on an NVIDIA A100-40G (via vLLM); equivalent to >200,000 pages/day on an A100 |
| Word Error Rate (%) | ~3% (96%+ OCR decoding accuracy at 9–10× compression on the Fox benchmark; ~97% precision at <10× compression per arXiv paper and DigitalOcean docs; ~60% accuracy at 20× compression) |