Nemotron 3 Ultra

NVIDIA · May 2026

activeOpen Weighthybrid mamba transformertext
Parameters550B (55B active)
Context Window1M tokens

Why It Matters

NVIDIA's answer to GPT-5 and Claude — a 550B MoE monster that uses only 55B active parameters, combining Mamba's linear-time sequence modeling with Transformer attention.

Description

NVIDIA's flagship reasoning model and the largest in the Nemotron 3 family. A 550B parameter Mixture-of-Experts model that activates only 55B parameters per computation, combining Mamba's linear-time sequence processing with Transformer attention for both efficiency and quality. Positioned as NVIDIA's answer to frontier models from OpenAI and Anthropic.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Related Research (2)

MambaArchitecture
2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

2019 · NVIDIA

Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.