GPT-2
OpenAI · February 2019
Why It Matters
First AI model to spark widespread public debate about the risks of powerful language generation. Demonstrated that simply scaling up a model could unlock entirely new capabilities.
Description
Scaled up to 1.5 billion parameters and trained on web pages linked from Reddit. Surprised researchers by performing tasks it was never explicitly trained for — a property called zero-shot learning. OpenAI initially withheld the full model due to concerns it could be used to generate convincing misinformation.
Notable Milestones
- ▸First major AI model withheld for safety concerns
- ▸Inspired the open-source AI movement after eventual full release
- ▸Widely used in text generation research and education
Key Innovations
Related Research (2)
Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…
Scaled GPT to 1.5B parameters on WebText. Demonstrated emergent unsupervised multitask learning without task-specific fine-tuning.