

Qwen3-VL-32B
#39 v Multimodální modelyqwen · v3 · vl 32b · od 2025-10-21 · 2× · naposledy 29. 6. 2026
10
Momentum
Qwen3-VL-32B is a dense vision-language model (33 billion parameters) from Alibaba's Qwen series that processes text, images, and video in a multimodal fashion. It natively supports a 256K-token context window (expandable to 1M), integrates text-timestamp alignment for video analysis, and can process hour-long video content. The model is offered in two variants—Instruct and Thinking—and is released as an open-weight model under the Apache 2.0 license, suitable for both cloud API use and local self-hosting.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Context Window (Tokens) | 256,000 tokens native (expandable to 1,000,000 tokens) |
| Multimodal Inputs | Text, images (single and multiple), videos (up to >1.5 hours); supports interleaved text-image-video inputs within the same context window |
| On-Device vs. Cloud | Both: open-weight model (Apache 2.0), local self-hosting via vLLM/SGLang possible (21 GB in Ollama format); cloud API available via Alibaba Cloud, OpenRouter, and Together AI |
| Price per Unit | OpenRouter: $0.104 / 1M input tokens, $0.416 / 1M output tokens; Artificial Analysis (Alibaba API): $0.70 / 1M input tokens, $2.80 / 1M output tokens |