Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Qwen3-VL-32B

#39 in Multimodal Models

qwen · v3 · vl 32b · since 2025-10-21 · 2× · last seen Jun 29, 2026

Momentum

Qwen3-VL-32B is a dense vision-language model (33 billion parameters) from Alibaba's Qwen series that processes text, images, and video in a multimodal fashion. It natively supports a 256K-token context window (expandable to 1M), integrates text-timestamp alignment for video analysis, and can process hour-long video content. The model is offered in two variants—Instruct and Thinking—and is released as an open-weight model under the Apache 2.0 license, suitable for both cloud API use and local self-hosting.

Momentum trend

04.04.03.07.

Features

Context Window (Tokens)	256,000 tokens native (expandable to 1,000,000 tokens)
Multimodal Inputs	Text, images (single and multiple), videos (up to >1.5 hours); supports interleaved text-image-video inputs within the same context window
On-Device vs. Cloud	Both: open-weight model (Apache 2.0), local self-hosting via vLLM/SGLang possible (21 GB in Ollama format); cloud API available via Alibaba Cloud, OpenRouter, and Together AI
Price per Unit	OpenRouter: $0.104 / 1M input tokens, $0.416 / 1M output tokens; Artificial Analysis (Alibaba API): $0.70 / 1M input tokens, $2.80 / 1M output tokens

Qwen3-VL-32B

Features

Sources (2)

More products in this category: Multimodal Models

Subscribe free. Unsubscribe the second it sucks.