Nemotron 3 Super
NVIDIA · March 2026
● activeOpen Weighthybrid mamba transformertext
Parameters120B (12B active)
Context Window1M tokens
Description
The mid-range model in NVIDIA's Nemotron 3 family, designed for multi-agent applications where multiple AI models collaborate to solve complex tasks. Uses the same hybrid Mamba-Transformer MoE architecture as Nano but scaled up to 120B total parameters with 12B active, and supports a 1 million token context window — enough to process roughly 750,000 words at once.
Key Innovations
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
Family Tree
Built On
Lineage
Successors (1)
Related Research (2)
MambaArchitecture
2023 · Carnegie Mellon University / Princeton
Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…
Megatron-LMScaling
2019 · NVIDIA
Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.