LLaMA 3

Meta · April 2024

activeOpen Weightdecoder onlytextAPI Available
Parameters8B / 70B
Context Window8K tokens
Variants8B, 70B

Why It Matters

Closed the quality gap between open and closed models, proving that openly available models could rival the best proprietary systems on many benchmarks.

Description

A major leap in open-model quality, available in 8B and 70B sizes. Trained on 15 trillion tokens of text data — roughly 7 times more than LLaMA 2 — which dramatically improved its ability to reason, write code, and follow instructions. Approached GPT-4-level performance on many tasks.

Notable Milestones

  • Approached GPT-4-class performance as an open model
  • Trained on 15T tokens — 7x more data than LLaMA 2
  • Widely deployed via Hugging Face and cloud providers

Key Innovations

Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Built On

Lineage

LLaMALLaMA 2LLaMA 3

Related Research (4)

LLaMAScaling
2023 · Meta AI

Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…

RoPEArchitecture
2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

SwiGLUArchitecture
2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.

External Links