GPT-2

OpenAI · February 2019

● activeOpen Sourcedecoder onlytext

Parameters1.5B

Context Window1K tokens

Variants117M, 345M, 762M, 1.5B

Why It Matters

First AI model to spark widespread public debate about the risks of powerful language generation. Demonstrated that simply scaling up a model could unlock entirely new capabilities.

Description

Scaled up to 1.5 billion parameters and trained on web pages linked from Reddit. Surprised researchers by performing tasks it was never explicitly trained for — a property called zero-shot learning. OpenAI initially withheld the full model due to concerns it could be used to generate convincing misinformation.

Notable Milestones

▸First major AI model withheld for safety concerns
▸Inspired the open-source AI movement after eventual full release
▸Widely used in text generation research and education

Key Innovations

Autoregressive

AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.

Zero-Shot

Zero-ShotPerforming tasks without any examples — the model generalizes from its training alone.

Family Tree

Built On

GPT-1

Lineage

GPT-1→GPT-2

Successors (1)

GPT-3

Related Research (2)

TransformerTransformer

2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

GPT-2Transformer

2019 · OpenAI

Scaled GPT to 1.5B parameters on WebText. Demonstrated emergent unsupervised multitask learning without task-specific fine-tuning.

External Links

Research Paper

More from OpenAI GPT

GPT-12018-06 · 117M

GPT-32020-06 · 175B

Codex2021-08 · 12B

InstructGPT / text-davinci-0022022-01 · 175B

GPT-3.5 / ChatGPT2022-11 · 175B

GPT-42023-03 · ~1.7T (est. MoE)

GPT-4 Turbo2023-11 · ~1.7T (est. MoE)

GPT-4o2024-05 · —

GPT-4o Mini2024-07 · —

GPT-4.12025-04 · —

PreviousGPT-1

NextGPT-3