Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

oMLX

#10 v Lokální LLM runtime

omlx · od 2026-02-13 · 2× · naposledy 29. 6. 2026

Momentum

oMLX is a native macOS inference server for Apple Silicon (M1 or later), built on Apple's MLX framework. Its core feature is a two-tier KV cache (hot tier in RAM, cold tier on SSD in safetensors format) that persists cache blocks across server restarts. The server supports text LLMs, VLMs, OCR models, embeddings, and rerankers, and exposes both an OpenAI-compatible and an Anthropic-compatible REST API. It is managed via a native macOS menu bar app (not Electron) with a supplementary web admin dashboard.

Vývoj momenta

04.04.03.07.

Vlastnosti

API Type	OpenAI-compatible (/v1/chat/completions) + Anthropic-compatible (/v1/messages); FastAPI-based
Inference Backend	Apple MLX (mlx-lm / mlx-vlm); BatchGenerator for continuous batching; two-tier paged KV cache (RAM + SSD)
Maximum Model Size (GB RAM)	Minimum 16 GB RAM; 64 GB+ recommended; tested configurations up to 512 GB (Mac Studio M3 Ultra)
Platforms (OS Support)	macOS 15+ (Sequoia) on Apple Silicon (M1/M2/M3/M4) — no Windows, no Linux, no Intel Mac
Price Tier	Free, open source (Apache License 2.0)
UI Type	Native macOS menu bar app (SwiftUI/PyObjC, no Electron) + web admin dashboard (/admin) for model management, chat, benchmarks, and monitoring

oMLX

Vlastnosti

Zdroje (2)

Další produkty v této kategorii: Lokální LLM runtime

Subscribe free. Unsubscribe the second it sucks.