InstructGPT / text-davinci-002

OpenAI · January 2022

activeCloseddecoder onlytextAPI Available
Parameters175B
Context Window4K tokens

Why It Matters

Pioneered RLHF for large language models, solving the critical problem of making AI follow human instructions reliably. This technique became the industry standard and directly enabled ChatGPT.

Description

GPT-3 refined using RLHF (reinforcement learning from human feedback) — a technique where human evaluators rate the model's outputs and the model learns to produce responses humans prefer. This was the critical alignment breakthrough that made GPT models helpful, harmless, and suitable for real-world use.

Notable Milestones

  • First production model to use RLHF at scale
  • Proved that a smaller RLHF-trained model could outperform a 100× larger base model
  • Established the alignment paradigm used by virtually all modern AI assistants

Key Innovations

RLHF
RLHFReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs.
Instruction Tuning
Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.

Family Tree

Built On

Lineage

GPT-1GPT-2GPT-3InstructGPT / text-davinci-002

Successors (1)

Related Research (3)

GPT-3Transformer
2020 · OpenAI

175B-parameter GPT. Pioneered few-shot and in-context learning, dramatically reducing the need for fine-tuning.

2017 · OpenAI / DeepMind

Pioneered the RLHF paradigm — training a reward model from human preferences, then using it to fine-tune policies via reinforcement learning.

InstructGPTAlignment
2022 · OpenAI

Applied RLHF to GPT-3: supervised fine-tuning → reward modeling → PPO optimization. Made models safer, more helpful, and more aligned.

External Links