

Nemotron
#35 in Frontier LLMsnvidia · since Nemotron 3 Nano: 14. Dez. 2025; Nemotron 3 Super: 11. März 2026; Nemotron 3 Ultra: 4. Juni 2026 · 26× · last seen Jul 03, 2026
NVIDIA Nemotron is a family of open language models (Nemotron 3) in three sizes – Nano (31.6B/3.2B active), Super (120B/12B active), and Ultra (550B/55B active) – built on a hybrid Mamba-Transformer MoE architecture (LatentMoE, NVFP4), designed primarily for agentic AI workflows. Weights, training data, and recipes are fully open under the NVIDIA Nemotron Open Model License. Models are served via build.nvidia.com as NIM microservices and on Hugging Face, with support for vLLM, SGLang, Ollama, and llama.cpp on NVIDIA GPUs (Ampere, Hopper, Blackwell). Multimodality (text, image, video, audio) is provided by the Nemotron 3 Nano Omni variant.
Features
| Key Benchmark (%) | Super: SWE-Bench Verified 60.47% | Ultra (550B): Artificial Analysis Intelligence Index 48 (highest US open-weight score) | Nano: AIME 2025 (with tools) 99.2% |
| Context Window (Tokens) | Nano & Super: 1,000,000 tokens | Ultra (550B): 1,000,000 tokens | Nano Omni: 300,000 tokens |
| License | NVIDIA Nemotron Open Model License (permissive: commercial use allowed, derivative works allowed, no attribution required) |
| Multimodality | Nemotron 3 Nano Omni: text, image, video, audio (input) → text (output); standard text models (Nano/Super/Ultra): text only |
| Platform | build.nvidia.com (NIM microservices), Hugging Face, OpenRouter, Perplexity; deployment via vLLM, SGLang, Ollama, llama.cpp; hardware: NVIDIA Ampere/Hopper/Blackwell GPUs |
| Price per 1M Tokens | Nano 30B: from $0.05 input / $0.20 output | Super 120B: from $0.10 input / $0.50 output | Ultra 253B (Llama): $2.00 input / $6.00 output (via NVIDIA NIM API); NIM hosted tier free (prototyping, ~40 RPM) |
| Release Date | Nano: December 14, 2025 | Super: March 11, 2026 (GTC) | Ultra: June 4, 2026 (Computex) |