

GLM-4.5V
#36 v Multimodální modelyzhipu · v4.5v · od 2025-08-11 · 2× · naposledy 29. 6. 2026
10
Momentum
GLM-4.5V is a multimodal vision-language model by Zhipu AI (Z.ai) built on the GLM-4.5-Air architecture (106B total parameters, 12B active, MoE). Released on August 11, 2025 as open-source under the MIT license, it accepts image, video, and text inputs. It achieves state-of-the-art results on 42 public vision-language benchmarks among open-source models of comparable size, and features a switchable "Thinking Mode" for deep reasoning.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Context Window (Tokens) | 65,536 token context window (OpenRouter); SiliconFlow lists 66K; max output 16,384 tokens |
| Multimodal Inputs | Text, images (native resolution/aspect ratio), videos; tool use; supported tasks: image Q&A, OCR, document parsing, GUI agents, visual grounding, video understanding, frontend coding |
| On-Device vs. Cloud | Cloud API (via Z.ai / bigmodel.cn, OpenRouter, Fireworks, Novita, and others); open-source (MIT), self-hostable with FP8/BF16 via Transformers, vLLM, SGLang |
| Price per Unit | $0.60 per 1M input tokens / $1.80 per 1M output tokens (via OpenRouter, TypingMind, developer.puter.com – as of Jun 2026) |
| Video Analysis Capability | Supports long-video segmentation and event detection (VideoMME, MMVU, LVBench); timestamp token encoding for temporal understanding; benchmarks: VideoMME, MMVU, MotionBench, MVBench, LVBench |