

Qwen2.5-32B-Instruct
#51 v Open-source jazykové modelyalibaba · v2.5 · 32b instruct · od 2024-09-19 · 3× · naposledy 30. 6. 2026
9
Momentum
Qwen2.5-32B-Instruct is an instruction-fine-tuned open-weight language model by Alibaba Cloud (Qwen team) with 32.5 billion parameters, released in September 2024 under the Apache 2.0 license. The model supports context windows of up to 128,000 tokens and can generate up to 8,192 tokens. Pretrained on 18 trillion tokens, it shows significant improvements over Qwen2 in instruction following, coding, mathematics, and structured outputs (JSON). It is freely downloadable as an open-weight model and accessible via various API providers.
Vývoj momenta
04.04.03.07.
Vlastnosti
| Context Window | 131,072 tokens (128K) maximum input length; maximum output 8,192 tokens. Default config.json set to 32,768 tokens, long context enablable via rope_scaling. |
| Model Size (Parameters) | 32.5 billion parameters (32.5B); 64 transformer layers; architecture: dense decoder-only with RoPE, SwiGLU, RMSNorm, attention QKV bias |
| Price Tier | Open-weight / free to self-host (Apache 2.0). Via OpenRouter API: approx. $0.79/M input tokens and $0.40/M output tokens (as of March 2025). Pay-as-you-go, no subscription required. |
| Memory Requirement | approx. 65 GB VRAM at BF16/FP16 (full precision, inference incl. weights + KV cache); approx. 33 GB at INT8; approx. 18 GB at INT4 quantization. Recommended GPU for FP16: A100 80GB. |