Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

llama.cpp

llama-cpp · seit 10. März 2023 (Erstveröffentlichung durch Georgi Gerganov) · 28× · zuletzt 30. Juni 2026

Momentum

llama.cpp is an open-source C/C++ library for local and cloud inference of large language models, created by Georgi Gerganov. It runs without external dependencies on a wide range of CPUs and GPUs and uses its own GGUF file format for quantized models. The project provides CLI tools and a server with an OpenAI-compatible API, forming the technical backbone of many popular local LLM applications such as Ollama and LM Studio. It is released under the MIT license and is free to download.

Historique du momentum

04.04.03.07.

Fonctionnalités

Deployment (Self-Hosted/Cloud)	Self-hosted (local, server, Docker) as well as cloud deployment possible, e.g. via Hugging Face Inference Endpoints
Throughput/Latency	Highly hardware-dependent; example: RTX 3060 12GB approx. 42 tok/s (8B, Q4), M1 MacBook approx. 30-50 tok/s (7B quantized)
License	MIT License
Platform	Windows, Linux, macOS; runs on CPU, Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel/SYCL, Vulkan, RISC-V, among others
Price	Free, open source (no license fees)
Protocol Compatibility	OpenAI-compatible API endpoints (e.g. v1/chat/completions), grammar-based JSON output
Release Date	March 10, 2023 (initial release by Georgi Gerganov)
Supported Models/Providers	Llama, Mistral, Gemma, DeepSeek, gpt-oss, Phi, Qwen, and many more in GGUF format

llama.cpp

Fonctionnalités

Preuves (28)

Subscribe free. Unsubscribe the second it sucks.