Nemotron 3 Nano

NVIDIA · December 2025

activeOpen Weighthybrid mamba transformertext
Parameters30B (3B active)
Context Window1M tokens

Description

The first model in NVIDIA's Nemotron 3 family, using a hybrid architecture that combines Mamba (a new type of sequence model that processes text in linear time, making it much faster for long sequences) with traditional Transformer attention, arranged as a Mixture-of-Experts. Has 30B total parameters but only activates 3B at a time, making it efficient enough to run on edge devices.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Successors (1)

Related Research (2)

MambaArchitecture
2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

2019 · NVIDIA

Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.