Gemma 4 is an open model family from Google DeepMind with various variants (12B, 26B MoE, 31B). What stands out is the optimization for local deployment: different versions run on 12GB or 16GB laptops, and quantized variants (QAT) reduce memory consumption by approximately 75% (e.g., Gemma 4 E2B fits in about 1GB). The model is multimodal and supports client-side inference.
Momentum trend
01.04.30.06.
Features
Geschwindigkeit
DiffusionGemma-Variante ist 4× schneller als andere Gemma 4 Varianten