

Bonsai
#19 v Small & Edge modelyprism-ml · od 2026-03-31 · 2× · naposledy 29. 6. 2026
10
Momentum
Bonsai-4B is an open-source 1-bit language model by PrismML based on the Qwen3 architecture, designed for deployment on edge devices such as iPhones, Macs, and CUDA GPUs. Weights are stored at 1.125 bits per parameter (1 sign bit plus one FP16 scale per group of 128 weights). The model is distributed in GGUF (Q1_0) and MLX 1-bit formats under the Apache 2.0 license. PrismML released it alongside Bonsai-8B and Bonsai-1.7B on March 31, 2026, when the company emerged from stealth.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Throughput (Tokens/Second) | Approx. 23 tokens/s on M1 MacBook Air (independent test); Bonsai-8B as reference: 44 tokens/s on iPhone 17 Pro Max; 80–100+ tokens/s on RTX 5060 Ti (reported for the 8B model) |
| Context Window | 32,768 tokens (32K) |
| Model Size (Parameters) | 4 billion parameters (architecture: Qwen3); GGUF file: 572 MB (Q1_0, 1.125 bpw) |
| Offline Capability | Fully offline-capable; runs locally on iPhone/iPad (via MLX Swift), Apple Silicon Macs (MLX), and CUDA and Metal GPUs (llama.cpp fork). No cloud access required. |
| Price Tier | Free / Open Source (Apache 2.0) – commercial use, modification, and redistribution permitted without restrictions |
| Memory Footprint (GB) | ~0.5 GB (GGUF Q1_0 on disk: 0.57 GB incl. tokenizer/metadata; parameter memory excluding metadata even smaller); for comparison: unpacked/FP16 variant requires 8.1 GB VRAM |