

SGLang
#3 v LLM inference a servinglmsys · od Januar 2024 · 6× · naposledy 30. 6. 2026
25
Momentum
SGLang is an open-source, high-performance inference framework for large language models and multimodal models, hosted by LMSYS under a non-profit organization. The system combines a Python-embedded language for structured text generation with an optimized runtime and uses RadixAttention for efficient KV cache reuse. SGLang is deployed in production on over 400,000 GPUs worldwide and generates trillions of tokens daily.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Agent Capabilities | Structured generation with primitives for generation, selection, and parallel control flows; tool integration possible |
| Base Model/Framework | Model-agnostic; supports Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, and others; compatible with Hugging Face and OpenAI APIs |
| Code Execution & Sandboxing | No dedicated code execution/sandboxing features documented |
| Human-in-the-Loop | No dedicated human-in-the-loop functionality documented |
| Context Retention | RadixAttention for automatic KV cache reuse; hierarchical KV caching for long context windows; chunked prefill; prefix caching |
| Price Tier | Free (open-source under Apache License) |