Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
qwen

Qwen3-VL-30B-A3B

#41 in Multimodale Modelle

qwen · v3 · vl 30b a3b · siet 2025-10-04 · 2× · tolest 29. Juni 2026

10
Momentum

Qwen3-VL-30B-A3B is a multimodal vision-language model from Alibaba's Qwen team built on a Mixture-of-Experts (MoE) architecture: 30.5B total parameters with only approximately 3.3B activated per inference. It processes text, images, and video within a unified context and natively supports a 256K-token context window (extensible to 1M tokens). Released as an open-weight model under the Apache 2.0 license, it can be deployed locally (on-device, e.g., 4-bit quantization on 32 GB RAM) or accessed via cloud APIs.

Momentum-Verloop
04.04.03.07.

Features

Context Window (Tokens)Native 256K tokens (262,144 per API documentation); expandable to 1M tokens; max output 32,768 tokens
Multimodal InputsText, images, and videos (including interleaved / multi-image multi-turn); plus OCR in 32 languages, 2D/3D spatial grounding, GUI screenshots
On-Device vs. CloudBoth possible: open-weight (Apache 2.0), locally deployable via vLLM / SGLang / llama.cpp / Ollama (runs with 4-bit quantization on 32GB RAM); cloud API via OpenRouter, DeepInfra, SiliconFlow, and others
Price per Unit$0.13 per million input tokens / $0.52 per million output tokens (Instruct variant via OpenRouter)
Vision-Language Benchmark ScoreDocVQA (test): 95.0% | ScreenSpot: 94.7% | OCRBench: 90.3% | MMLU-Redux: 88.4% | MMBench-V1.1: 87.0% (Instruct variant)

Belege (2)

Mehr Produkten in disse Kategorie: Multimodale Modelle

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.