Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
llama-cpp

llama.cpp

#2 v Lokální LLM runtime

llama-cpp · od 10. März 2023 (Erstveröffentlichung durch Georgi Gerganov) · 28× · naposledy 30. 6. 2026

64
Momentum

llama.cpp is an open-source C/C++ library for local and cloud inference of large language models, created by Georgi Gerganov. It runs without external dependencies on a wide range of CPUs and GPUs and uses its own GGUF file format for quantized models. The project provides CLI tools and a server with an OpenAI-compatible API, forming the technical backbone of many popular local LLM applications such as Ollama and LM Studio. It is released under the MIT license and is free to download.

Vývoj momenta
04.04.03.07.

Vlastnosti

Deployment (Self-Hosted/Cloud)Self-hosted (local, server, Docker) as well as cloud deployment possible, e.g. via Hugging Face Inference Endpoints
Throughput/LatencyHighly hardware-dependent; example: RTX 3060 12GB approx. 42 tok/s (8B, Q4), M1 MacBook approx. 30-50 tok/s (7B quantized)
LicenseMIT License
PlatformWindows, Linux, macOS; runs on CPU, Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel/SYCL, Vulkan, RISC-V, among others
PriceFree, open source (no license fees)
Protocol CompatibilityOpenAI-compatible API endpoints (e.g. v1/chat/completions), grammar-based JSON output
Release DateMarch 10, 2023 (initial release by Georgi Gerganov)
Supported Models/ProvidersLlama, Mistral, Gemma, DeepSeek, gpt-oss, Phi, Qwen, and many more in GGUF format

Zdroje (28)

Další produkty v této kategorii: Lokální LLM runtime

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.