Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
llama-cpp

llama.cpp

#2

llama-cpp · seit 10. März 2023 (Erstveröffentlichung durch Georgi Gerganov) · 28× · zuletzt 30. Juni 2026

64
Momentum

llama.cpp is an open-source C/C++ library for local and cloud inference of large language models, created by Georgi Gerganov. It runs without external dependencies on a wide range of CPUs and GPUs and uses its own GGUF file format for quantized models. The project provides CLI tools and a server with an OpenAI-compatible API, forming the technical backbone of many popular local LLM applications such as Ollama and LM Studio. It is released under the MIT license and is free to download.

Historique du momentum
04.04.03.07.

Fonctionnalités

Deployment (Self-Hosted/Cloud)Self-hosted (local, server, Docker) as well as cloud deployment possible, e.g. via Hugging Face Inference Endpoints
Throughput/LatencyHighly hardware-dependent; example: RTX 3060 12GB approx. 42 tok/s (8B, Q4), M1 MacBook approx. 30-50 tok/s (7B quantized)
LicenseMIT License
PlatformWindows, Linux, macOS; runs on CPU, Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel/SYCL, Vulkan, RISC-V, among others
PriceFree, open source (no license fees)
Protocol CompatibilityOpenAI-compatible API endpoints (e.g. v1/chat/completions), grammar-based JSON output
Release DateMarch 10, 2023 (initial release by Georgi Gerganov)
Supported Models/ProvidersLlama, Mistral, Gemma, DeepSeek, gpt-oss, Phi, Qwen, and many more in GGUF format

Preuves (28)

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.