Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

oMLX

#10 in Lokale LLM-Runtimes

omlx · siet 2026-02-13 · 2× · tolest 29. Juni 2026

Momentum

oMLX is a native macOS inference server for Apple Silicon (M1 or later), built on Apple's MLX framework. Its core feature is a two-tier KV cache (hot tier in RAM, cold tier on SSD in safetensors format) that persists cache blocks across server restarts. The server supports text LLMs, VLMs, OCR models, embeddings, and rerankers, and exposes both an OpenAI-compatible and an Anthropic-compatible REST API. It is managed via a native macOS menu bar app (not Electron) with a supplementary web admin dashboard.

Momentum-Verloop

04.04.03.07.

Features

API Type	OpenAI-compatible (/v1/chat/completions) + Anthropic-compatible (/v1/messages); FastAPI-based
Inference Backend	Apple MLX (mlx-lm / mlx-vlm); BatchGenerator for continuous batching; two-tier paged KV cache (RAM + SSD)
Maximum Model Size (GB RAM)	Minimum 16 GB RAM; 64 GB+ recommended; tested configurations up to 512 GB (Mac Studio M3 Ultra)
Platforms (OS Support)	macOS 15+ (Sequoia) on Apple Silicon (M1/M2/M3/M4) — no Windows, no Linux, no Intel Mac
Price Tier	Free, open source (Apache License 2.0)
UI Type	Native macOS menu bar app (SwiftUI/PyObjC, no Electron) + web admin dashboard (/admin) for model management, chat, benchmarks, and monitoring

oMLX

Features

Belege (2)

Mehr Produkten in disse Kategorie: Lokale LLM-Runtimes

Subscribe free. Unsubscribe the second it sucks.