GPT-2

OpenAI · February 2019

activeOpen Sourcedecoder onlytext
Parameters1.5B
Context Window1K tokens
Variants117M, 345M, 762M, 1.5B

Why It Matters

First AI model to spark widespread public debate about the risks of powerful language generation. Demonstrated that simply scaling up a model could unlock entirely new capabilities.

Description

Scaled up to 1.5 billion parameters and trained on web pages linked from Reddit. Surprised researchers by performing tasks it was never explicitly trained for — a property called zero-shot learning. OpenAI initially withheld the full model due to concerns it could be used to generate convincing misinformation.

Notable Milestones

  • First major AI model withheld for safety concerns
  • Inspired the open-source AI movement after eventual full release
  • Widely used in text generation research and education

Key Innovations

Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Zero-Shot
Zero-ShotPerforming tasks without any examples — the model generalizes from its training alone.

Family Tree

Built On

Lineage

GPT-1GPT-2

Successors (1)

Related Research (2)

TransformerTransformer
2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

GPT-2Transformer
2019 · OpenAI

Scaled GPT to 1.5B parameters on WebText. Demonstrated emergent unsupervised multitask learning without task-specific fine-tuning.

External Links