GPT-1

OpenAI · June 2018

● activeOpen Sourcedecoder onlytext

Parameters117M

Context Window512 tokens

Why It Matters

Proved that generative pre-training — learning language by predicting the next word — could produce a single model useful for many different tasks, laying the foundation for all GPT models that followed.

Description

The first model to show that a Transformer (a type of neural network architecture) could learn general language skills by simply reading vast amounts of text, then be fine-tuned for specific tasks. Trained on BookCorpus with 117 million parameters — modest by today's standards, but groundbreaking at the time.

Key Innovations

Autoregressive

AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Successors (1)

GPT-2

Related Research (2)

TransformerTransformer

2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

GPT-1Transformer

2018 · OpenAI

First decoder-only Transformer pretrained generatively on BooksCorpus. Demonstrated zero-shot transfer learning via fine-tuning.