

Qwen3-VL-8B
#25qwen · v3 · vl 8b · seit 2025-10-15 · 4× · zuletzt 29. Juni 2026
15
Momentum
Qwen3-VL-8B is an open-weight multimodal vision-language model from Alibaba Cloud's Qwen team with approximately 8.77 billion dense parameters. It was released on October 15, 2025 as part of the Qwen3-VL series under the Apache 2.0 license, permitting commercial use. The model processes text, images, and video within a native 256K-token context window, extendable to 1 million tokens. It is available both via cloud API providers and for local self-hosted deployment.
Historique du momentum
04.04.03.07.
Fonctionnalités
| Context Window (Tokens) | 256,000 tokens native (262,144 tokens per model card); expandable to approx. 1 million tokens; maximum output length: 32,768 tokens |
| Multimodal Inputs | Text, images, and videos; OCR in 32 languages; 2D/3D object grounding; GUI control (PC/mobile); code generation from images/videos (Draw.io, HTML, CSS, JS) |
| Price per Unit | $0.08 per 1M input tokens / $0.50 per 1M output tokens (via OpenRouter/Novita; Instruct variant) |
| Vision-Language Benchmark Score | Qwen3-VL-8B-Instruct: DocVQA (test) 96.1%, ScreenSpot 94.4%, OCRBench 89.6%, MMBench-V1.1 85.0%, AI2D 85.7%; Qwen3-VL-8B-Thinking: DocVQA 95.3%, ScreenSpot 93.6%, MMBench-V1.1 87.5%, MMLU-Redux 88.8% |