Gemma 2
Google DeepMind · June 2024
● activeOpen Sourcedecoder onlytext
Parameters2B / 9B / 27B
Context Window8K tokens
Variants2B, 9B, 27B
Why It Matters
Set a new standard for small model efficiency. The 27B variant competed with models 2-3x its size, proving that distillation from larger Gemini models was a viable strategy for building capable small models.
Description
Second-generation open models with significantly improved performance per parameter. Used knowledge distillation — a technique where a smaller model learns to mimic the behavior of a much larger one — to punch above its weight class. The 27B model outperformed many larger competitors on the LMSYS Chatbot Arena leaderboard. Available in 2B, 9B, and 27B sizes.
Notable Milestones
- ▸27B model ranked highly on LMSYS Chatbot Arena despite its small size
- ▸Included ShieldGemma safety classifiers and Gemma Scope interpretability tools
Benchmark Scores
MMLUMassive Multitask Language Understanding — 57 subjects
75.2%HumanEvalCode generation pass@1 — Python problems
51.8%Key Innovations
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.
Related Research (1)
Grouped-Query AttentionArchitecture
2023 · Google Research
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…