GPT-1

OpenAI · June 2018

activeOpen Sourcedecoder onlytext
Parameters117M
Context Window512 tokens

Why It Matters

Proved that generative pre-training — learning language by predicting the next word — could produce a single model useful for many different tasks, laying the foundation for all GPT models that followed.

Description

The first model to show that a Transformer (a type of neural network architecture) could learn general language skills by simply reading vast amounts of text, then be fine-tuned for specific tasks. Trained on BookCorpus with 117 million parameters — modest by today's standards, but groundbreaking at the time.

Key Innovations

Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Successors (1)

Related Research (2)

TransformerTransformer
2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

GPT-1Transformer
2018 · OpenAI

First decoder-only Transformer pretrained generatively on BooksCorpus. Demonstrated zero-shot transfer learning via fine-tuning.

Enabled By

Tesla V100NVIDIA · May 2017
125 TFLOPS FP16 Tensor

External Links