Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

llama.cpp

#2 v Lokální LLM runtime

llama-cpp · od 10. März 2023 (Erstveröffentlichung durch Georgi Gerganov) · 28× · naposledy 30. 6. 2026

Momentum

llama.cpp is an open-source C/C++ library for local and cloud inference of large language models, created by Georgi Gerganov. It runs without external dependencies on a wide range of CPUs and GPUs and uses its own GGUF file format for quantized models. The project provides CLI tools and a server with an OpenAI-compatible API, forming the technical backbone of many popular local LLM applications such as Ollama and LM Studio. It is released under the MIT license and is free to download.

Vývoj momenta

04.04.03.07.

Vlastnosti

Deployment (Self-Hosted/Cloud)	Self-hosted (local, server, Docker) as well as cloud deployment possible, e.g. via Hugging Face Inference Endpoints
Throughput/Latency	Highly hardware-dependent; example: RTX 3060 12GB approx. 42 tok/s (8B, Q4), M1 MacBook approx. 30-50 tok/s (7B quantized)
License	MIT License
Platform	Windows, Linux, macOS; runs on CPU, Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel/SYCL, Vulkan, RISC-V, among others
Price	Free, open source (no license fees)
Protocol Compatibility	OpenAI-compatible API endpoints (e.g. v1/chat/completions), grammar-based JSON output
Release Date	March 10, 2023 (initial release by Georgi Gerganov)
Supported Models/Providers	Llama, Mistral, Gemma, DeepSeek, gpt-oss, Phi, Qwen, and many more in GGUF format

llama.cpp

Vlastnosti

Zdroje (28)

Další produkty v této kategorii: Lokální LLM runtime

Subscribe free. Unsubscribe the second it sucks.