Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Qwen3-VL-8B

#25 in Multimodale Modelle

qwen · v3 · vl 8b · siet 2025-10-15 · 4× · tolest 29. Juni 2026

Momentum

Qwen3-VL-8B is an open-weight multimodal vision-language model from Alibaba Cloud's Qwen team with approximately 8.77 billion dense parameters. It was released on October 15, 2025 as part of the Qwen3-VL series under the Apache 2.0 license, permitting commercial use. The model processes text, images, and video within a native 256K-token context window, extendable to 1 million tokens. It is available both via cloud API providers and for local self-hosted deployment.

Momentum-Verloop

04.04.03.07.

Features

Context Window (Tokens)	256,000 tokens native (262,144 tokens per model card); expandable to approx. 1 million tokens; maximum output length: 32,768 tokens
Multimodal Inputs	Text, images, and videos; OCR in 32 languages; 2D/3D object grounding; GUI control (PC/mobile); code generation from images/videos (Draw.io, HTML, CSS, JS)
Price per Unit	$0.08 per 1M input tokens / $0.50 per 1M output tokens (via OpenRouter/Novita; Instruct variant)
Vision-Language Benchmark Score	Qwen3-VL-8B-Instruct: DocVQA (test) 96.1%, ScreenSpot 94.4%, OCRBench 89.6%, MMBench-V1.1 85.0%, AI2D 85.7%; Qwen3-VL-8B-Thinking: DocVQA 95.3%, ScreenSpot 93.6%, MMBench-V1.1 87.5%, MMLU-Redux 88.8%

Qwen3-VL-8B

Features

Belege (4)

Mehr Produkten in disse Kategorie: Multimodale Modelle

Subscribe free. Unsubscribe the second it sucks.