

oMLX
#10 v Lokální LLM runtimeomlx · od 2026-02-13 · 2× · naposledy 29. 6. 2026
16
Momentum
oMLX is a native macOS inference server for Apple Silicon (M1 or later), built on Apple's MLX framework. Its core feature is a two-tier KV cache (hot tier in RAM, cold tier on SSD in safetensors format) that persists cache blocks across server restarts. The server supports text LLMs, VLMs, OCR models, embeddings, and rerankers, and exposes both an OpenAI-compatible and an Anthropic-compatible REST API. It is managed via a native macOS menu bar app (not Electron) with a supplementary web admin dashboard.
Vývoj momenta
04.04.03.07.
Vlastnosti
| API Type | OpenAI-compatible (/v1/chat/completions) + Anthropic-compatible (/v1/messages); FastAPI-based |
| Inference Backend | Apple MLX (mlx-lm / mlx-vlm); BatchGenerator for continuous batching; two-tier paged KV cache (RAM + SSD) |
| Maximum Model Size (GB RAM) | Minimum 16 GB RAM; 64 GB+ recommended; tested configurations up to 512 GB (Mac Studio M3 Ultra) |
| Platforms (OS Support) | macOS 15+ (Sequoia) on Apple Silicon (M1/M2/M3/M4) — no Windows, no Linux, no Intel Mac |
| Price Tier | Free, open source (Apache License 2.0) |
| UI Type | Native macOS menu bar app (SwiftUI/PyObjC, no Electron) + web admin dashboard (/admin) for model management, chat, benchmarks, and monitoring |