Gemma 2

Google DeepMind · June 2024

activeOpen Sourcedecoder onlytext
Parameters2B / 9B / 27B
Context Window8K tokens
Variants2B, 9B, 27B

Why It Matters

Set a new standard for small model efficiency. The 27B variant competed with models 2-3x its size, proving that distillation from larger Gemini models was a viable strategy for building capable small models.

Description

Second-generation open models with significantly improved performance per parameter. Used knowledge distillation — a technique where a smaller model learns to mimic the behavior of a much larger one — to punch above its weight class. The 27B model outperformed many larger competitors on the LMSYS Chatbot Arena leaderboard. Available in 2B, 9B, and 27B sizes.

Notable Milestones

  • 27B model ranked highly on LMSYS Chatbot Arena despite its small size
  • Included ShieldGemma safety classifiers and Gemma Scope interpretability tools

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects
75.2%
HumanEvalCode generation pass@1 — Python problems
51.8%

Key Innovations

Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Built On

Lineage

GemmaGemma 2

Related Research (1)

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…