

Molmo2
#21 v Multimodální modelyallen-institute · v2 · od 2025-12-16 · 4× · naposledy 29. 6. 2026
20
Momentum
Molmo 2 is a family of open vision-language models (VLMs) by the Allen Institute for AI (Ai2), released on December 16, 2025. The family includes three variants (4B, 8B, 7B-O) and extends the original Molmo with video analysis, multi-image reasoning, and spatiotemporal grounding (video pointing and tracking). All weights, training data, and evaluation tools are freely available under Apache 2.0. According to Ai2's technical report, the 8B model outperforms the predecessor Molmo 72B and beats proprietary models such as Gemini 3 Pro on video tracking tasks.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Context Window (Tokens) | Molmo2-4B and Molmo2-8B: 36,864 tokens; Molmo2-O-7B: 65,536 tokens |
| Multimodal Inputs | Text, single image, video, multi-image (multiple images simultaneously); output: text; vision backbone: SigLIP 2 (for 4B/8B); LLM base: Qwen3 (4B/8B) or OLMo (7B-O) |
| On-Device vs. Cloud | Open-weights model: self-hostable locally (on-device/self-hosted); also available via Ai2 Playground (cloud) and soon via API; model weights freely available on Hugging Face and GitHub |
| Price per Unit | $0.00 per 1M input tokens / $0.00 per 1M output tokens (per Artificial Analysis; open-weights model, free to self-host under Apache 2.0) |