InstructGPT / text-davinci-002
OpenAI · January 2022
Why It Matters
Pioneered RLHF for large language models, solving the critical problem of making AI follow human instructions reliably. This technique became the industry standard and directly enabled ChatGPT.
Description
GPT-3 refined using RLHF (reinforcement learning from human feedback) — a technique where human evaluators rate the model's outputs and the model learns to produce responses humans prefer. This was the critical alignment breakthrough that made GPT models helpful, harmless, and suitable for real-world use.
Notable Milestones
- ▸First production model to use RLHF at scale
- ▸Proved that a smaller RLHF-trained model could outperform a 100× larger base model
- ▸Established the alignment paradigm used by virtually all modern AI assistants
Key Innovations
Family Tree
Built On
Successors (1)
Related Research (3)
175B-parameter GPT. Pioneered few-shot and in-context learning, dramatically reducing the need for fine-tuning.
Pioneered the RLHF paradigm — training a reward model from human preferences, then using it to fine-tune policies via reinforcement learning.
Applied RLHF to GPT-3: supervised fine-tuning → reward modeling → PPO optimization. Made models safer, more helpful, and more aligned.