

Qwen 2.5 Coder 32B
#85 in Open-Source-Spraakmodellealibaba · v2.5 · coder 32b · siet 2024-11-12 · 2× · tolest 30. Juni 2026
2
Momentum
Qwen 2.5 Coder 32B (also: Qwen2.5-Coder-32B-Instruct) is an open-source, code-specialized language model by Alibaba Cloud, built on the Qwen2.5 architecture and trained on over 5.5 trillion tokens of code, natural language, and synthetic data. It supports 92 programming languages and reaches coding benchmark performance comparable to GPT-4o. Released under the Apache 2.0 license, it can be run locally on hardware with at least 32 GB of RAM and is available as an open-weight download as well as through multiple cloud APIs.
Momentum-Verloop
04.04.03.07.
Features
| Benchmark Score (MMLU/Similar) | HumanEval: 92.7% pass@1; LiveCodeBench: 37.2% (beats GPT-4o at 29.2%); Aider Benchmark: 73.7% (rank 4); MMLU (Qwen2.5-32B-Base): 83.32 |
| Inference Speed | Local (Apple M2 Max, 64GB, Q4_K_M): ~12–15 tokens/s; on A100-80GB via vLLM: BF16 full-speed; consumer test (64GB MacBook Pro M2): ~10 tokens/s |
| Context Window | 128,000 tokens (native context window; config.json default: 32,768 tokens, extendable to 128K via RoPE/YaRN) |
| Model Size (Parameters) | 32.8 billion parameters (dense transformer, no MoE) |
| Price Tier | Open-weight (free self-hostable, Apache 2.0); API from $0.09 / 1M input and output tokens (via Lambda); available on OpenRouter |
| Memory Requirement | FP16 inference: ~71GB VRAM; INT4 quantization: ~18GB VRAM; local on Apple Silicon (MLX): ~32GB unified memory |