Nemotron 3 Ultra
NVIDIA · May 2026
Why It Matters
NVIDIA's answer to GPT-5 and Claude — a 550B MoE monster that uses only 55B active parameters, combining Mamba's linear-time sequence modeling with Transformer attention.
Description
NVIDIA's flagship reasoning model and the largest in the Nemotron 3 family. A 550B parameter Mixture-of-Experts model that activates only 55B parameters per computation, combining Mamba's linear-time sequence processing with Transformer attention for both efficiency and quality. Positioned as NVIDIA's answer to frontier models from OpenAI and Anthropic.
Key Innovations
Family Tree
Built On
Lineage
Related Research (2)
Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…
Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.