Llama-3.1-Nemotron-70B
NVIDIA · October 2024
● activeOpen Weightdecoder onlytext
Parameters70B
Context Window128K tokens
Why It Matters
Showed that NVIDIA's post-training techniques could make an open model outperform GPT-4o on many benchmarks.
Description
NVIDIA's enhanced version of Meta's LLaMA 3.1 70B, fine-tuned using a novel REINFORCE-style reward training approach (a technique from reinforcement learning that optimizes the model by rewarding good responses). Demonstrated that advanced post-training techniques could make an already-strong open model competitive with top proprietary models like GPT-4o.
Key Innovations
RLHF
RLHFReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs.
Instruction Tuning
Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.