Chinchilla

Google DeepMind · March 2022

activeCloseddecoder onlytext
Parameters70B
Context Window2K tokens

Why It Matters

Changed everything about how models are trained. Proved that a 70B model trained on more data outperforms a 280B model trained on less — the 'Chinchilla scaling laws' reshaped the entire industry.

Description

A 70B parameter model from Google DeepMind that changed everything about how AI models are trained. Its key insight: previous models were too large for the amount of data they were trained on. By training a smaller model on significantly more data, Chinchilla outperformed the 280B Gopher model, establishing the 'Chinchilla scaling laws' that reshaped the entire industry's approach to model training.

Key Innovations

Scaling Laws
Scaling LawsMathematical relationships showing how model performance improves predictably with more data, compute, and parameters.

Related Research (1)

ChinchillaScaling
2022 · DeepMind

Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.

External Links