

ollama
#1 in Local LLM Runtimesollama · since 2023-07-08 · 101× · last seen Jul 02, 2026
100
Momentum
Ollama is an open-source, locally operated runtime environment for large language models (LLMs), available free of charge under the MIT license. Founded in 2023 by Michael Chiang and Jeffrey Morgan, it allows users to run models such as Llama, Mistral, Qwen, Gemma, or DeepSeek on their own hardware with just a few commands. The primary inference backend is llama.cpp; from version 0.19 (March 2026) onwards, Apple's MLX framework is additionally supported on Apple Silicon. Ollama provides an OpenAI-compatible REST API and supports macOS, Linux, and Windows without any cloud dependency.
Momentum trend
04.04.03.07.
Features
| Deployment (Self-Hosted/Cloud) | Primarily self-hosted (local, Docker, own server). Optional: Ollama Cloud (GA since Sept. 2025) – hosted inference service on NVIDIA data center hardware, OpenAI-compatible endpoint, no data logging. |
| Throughput/Latency | Min. config (CPU, 8 GB RAM): 3–8 t/s @ 7B. 16 GB VRAM (RTX/M-Series): 30–60 t/s @ 7B–14B. Apple M4 / RTX 4090: ~40 t/s @ 7B Q4. Cloud: low TTFT + high throughput, no SLA. |
| License | MIT License (core/CLI, github.com/ollama/ollama). GUI app (from 2025) separate, no published license. |
| Platform | macOS (≥14 Sonoma), Windows 10/11 (amd64, arm64), Linux (glibc 2.31+, amd64/arm64), Docker. GPU: NVIDIA CUDA (Compute 5.0+), AMD ROCm (Linux), Apple Metal / MLX (Apple Silicon). |
| Price | Local: free & unlimited. Cloud: Free ($0), Pro ($20/month or $200/year), Max ($100/month). Billed by GPU time, no token cap. |
| Protocol Compatibility | Own REST API (port 11434, NDJSON streaming). OpenAI Chat Completions API-compatible. Anthropic Messages API-compatible. Python and JavaScript/TypeScript libraries. Structured outputs (JSON schema). |
| Release Date | July 2023 (first public GitHub release); currently v0.30.10 (June 2026) |
| Supported Models/Providers | Llama, Gemma 4, Qwen, DeepSeek, Mistral, Phi, gpt-oss, Kimi, GLM, MiniMax, LLaVA, and many more – full library at ollama.com/library. Chat, code, vision, embeddings, reasoning. |