Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Qwen3-VL-30B-A3B

#41 in Multimodale Modelle

qwen · v3 · vl 30b a3b · siet 2025-10-04 · 2× · tolest 29. Juni 2026

Momentum

Qwen3-VL-30B-A3B is a multimodal vision-language model from Alibaba's Qwen team built on a Mixture-of-Experts (MoE) architecture: 30.5B total parameters with only approximately 3.3B activated per inference. It processes text, images, and video within a unified context and natively supports a 256K-token context window (extensible to 1M tokens). Released as an open-weight model under the Apache 2.0 license, it can be deployed locally (on-device, e.g., 4-bit quantization on 32 GB RAM) or accessed via cloud APIs.

Momentum-Verloop

04.04.03.07.

Features

Context Window (Tokens)	Native 256K tokens (262,144 per API documentation); expandable to 1M tokens; max output 32,768 tokens
Multimodal Inputs	Text, images, and videos (including interleaved / multi-image multi-turn); plus OCR in 32 languages, 2D/3D spatial grounding, GUI screenshots
On-Device vs. Cloud	Both possible: open-weight (Apache 2.0), locally deployable via vLLM / SGLang / llama.cpp / Ollama (runs with 4-bit quantization on 32GB RAM); cloud API via OpenRouter, DeepInfra, SiliconFlow, and others
Price per Unit	$0.13 per million input tokens / $0.52 per million output tokens (Instruct variant via OpenRouter)
Vision-Language Benchmark Score	DocVQA (test): 95.0% \| ScreenSpot: 94.7% \| OCRBench: 90.3% \| MMLU-Redux: 88.4% \| MMBench-V1.1: 87.0% (Instruct variant)

Qwen3-VL-30B-A3B

Features

Belege (2)

Mehr Produkten in disse Kategorie: Multimodale Modelle

Subscribe free. Unsubscribe the second it sucks.