Nemotron 3 Ultra

NVIDIA · May 2026

● activeOpen Weighthybrid mamba transformertext

Parameters550B (55B active)

Context Window1M tokens

Why It Matters

NVIDIA's answer to GPT-5 and Claude — a 550B MoE monster that uses only 55B active parameters, combining Mamba's linear-time sequence modeling with Transformer attention.

Description

NVIDIA's flagship reasoning model and the largest in the Nemotron 3 family. A 550B parameter Mixture-of-Experts model that activates only 55B parameters per computation, combining Mamba's linear-time sequence processing with Transformer attention for both efficiency and quality. Positioned as NVIDIA's answer to frontier models from OpenAI and Anthropic.

Key Innovations

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Reasoning

ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

Long Context

Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Nemotron 3 Super

Lineage

Megatron-Turing NLG→Nemotron-4 15B→Nemotron-4 340B→Nemotron 3 Nano→Nemotron 3 Super→Nemotron 3 Ultra

Related Research (2)

MambaArchitecture

2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

Megatron-LMScaling

2019 · NVIDIA

Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.

More from NVIDIA Nemotron

Megatron-Turing NLG2021-10 · 530B

Nemotron-4 15B2024-03 · 15B

Nemotron-4 340B2024-06 · 340B

Llama-3.1-Nemotron-70B2024-10 · 70B

NVLM 1.02024-10 · 72B

Nemotron 3 Nano2025-12 · 30B (3B active)

Nemotron 3 Super2026-03 · 120B (12B active)

Cosmos 1.02025-01 · —

PreviousNemotron 3 Super