Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
qwen

Qwen3-VL-32B

#39 in Multimodal Models

qwen · v3 · vl 32b · since 2025-10-21 · 2× · last seen Jun 29, 2026

10
Momentum

Qwen3-VL-32B is a dense vision-language model (33 billion parameters) from Alibaba's Qwen series that processes text, images, and video in a multimodal fashion. It natively supports a 256K-token context window (expandable to 1M), integrates text-timestamp alignment for video analysis, and can process hour-long video content. The model is offered in two variants—Instruct and Thinking—and is released as an open-weight model under the Apache 2.0 license, suitable for both cloud API use and local self-hosting.

Momentum trend
04.04.03.07.

Features

Context Window (Tokens)256,000 tokens native (expandable to 1,000,000 tokens)
Multimodal InputsText, images (single and multiple), videos (up to >1.5 hours); supports interleaved text-image-video inputs within the same context window
On-Device vs. CloudBoth: open-weight model (Apache 2.0), local self-hosting via vLLM/SGLang possible (21 GB in Ollama format); cloud API available via Alibaba Cloud, OpenRouter, and Together AI
Price per UnitOpenRouter: $0.104 / 1M input tokens, $0.416 / 1M output tokens; Artificial Analysis (Alibaba API): $0.70 / 1M input tokens, $2.80 / 1M output tokens

Sources (2)

More products in this category: Multimodal Models

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.