GPT-1
OpenAI · June 2018
Why It Matters
Proved that generative pre-training — learning language by predicting the next word — could produce a single model useful for many different tasks, laying the foundation for all GPT models that followed.
Description
The first model to show that a Transformer (a type of neural network architecture) could learn general language skills by simply reading vast amounts of text, then be fine-tuned for specific tasks. Trained on BookCorpus with 117 million parameters — modest by today's standards, but groundbreaking at the time.
Key Innovations
Family Tree
Successors (1)
Related Research (2)
Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…
First decoder-only Transformer pretrained generatively on BooksCorpus. Demonstrated zero-shot transfer learning via fine-tuning.