Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

llama.cpp

#2 in Lokale LLM-Runtimes

llama-cpp · seit 10. März 2023 (Erstveröffentlichung durch Georgi Gerganov) · 28× · zuletzt 30. Juni 2026

Momentum

llama.cpp ist eine quelloffene C/C++-Bibliothek für die lokale und Cloud-Inferenz großer Sprachmodelle, entwickelt von Georgi Gerganov. Sie läuft ohne externe Abhängigkeiten auf CPUs und GPUs verschiedenster Hersteller und nutzt das eigene GGUF-Dateiformat für quantisierte Modelle. Das Projekt bietet CLI-Tools sowie einen Server mit OpenAI-kompatibler API und bildet die technische Grundlage vieler bekannter lokaler LLM-Anwendungen wie Ollama und LM Studio. Es steht unter der MIT-Lizenz und wird kostenlos zum Download bereitgestellt.

Momentum-Verlauf

04.04.03.07.

Features

Deployment (Self-host/Cloud)	Self-hosted (lokal, Server, Docker) sowie Cloud-Deployment möglich, z.B. via Hugging Face Inference Endpoints
Durchsatz/Latenz	Stark hardwareabhängig; Beispiel: RTX 3060 12GB ca. 42 tok/s (8B, Q4), M1 MacBook ca. 30-50 tok/s (7B quantisiert)
Lizenz	MIT License
Plattform	Windows, Linux, macOS; läuft auf CPU, Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel/SYCL, Vulkan, RISC-V u.a.
Preis	Kostenlos, Open Source (keine Lizenzgebühren)
Protokoll-Kompatibilität	OpenAI-kompatible API-Endpunkte (z.B. v1/chat/completions), Grammar-basierte JSON-Ausgabe
Release-Datum	10. März 2023 (Initial-Release durch Georgi Gerganov)
Unterstützte Modelle/Provider	Llama, Mistral, Gemma, DeepSeek, gpt-oss, Phi, Qwen u.v.m. im GGUF-Format

llama.cpp

Features

Belege (28)

Weitere Produkte in dieser Kategorie: Lokale LLM-Runtimes

Subscribe free. Unsubscribe the second it sucks.