LLaMA
Meta · February 2023
Why It Matters
Democratized AI research by releasing powerful models openly. Spawned an entire ecosystem of open-source AI development including Alpaca, Vicuna, and hundreds of community fine-tunes.
Description
Meta's first openly released large language model, available in sizes from 7 billion to 65 billion parameters. Despite being smaller than many competitors, it outperformed models like GPT-3 by training more efficiently on higher-quality data — proving that smarter training matters more than sheer size.
Notable Milestones
- ▸Sparked the open-source LLM movement
- ▸Basis for Stanford Alpaca and UC Berkeley Vicuna
- ▸Proved smaller well-trained models can beat larger ones
Key Innovations
Related Research (6)
Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…
Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.
Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.
Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…
Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.