Gemma 2

Google DeepMind · June 2024

● activeOpen Sourcedecoder onlytext

Parameters2B / 9B / 27B

Context Window8K tokens

Variants2B, 9B, 27B

Why It Matters

Set a new standard for small model efficiency. The 27B variant competed with models 2-3x its size, proving that distillation from larger Gemini models was a viable strategy for building capable small models.

Description

Second-generation open models with significantly improved performance per parameter. Used knowledge distillation — a technique where a smaller model learns to mimic the behavior of a much larger one — to punch above its weight class. The 27B model outperformed many larger competitors on the LMSYS Chatbot Arena leaderboard. Available in 2B, 9B, and 27B sizes.

Notable Milestones

▸27B model ranked highly on LMSYS Chatbot Arena despite its small size
▸Included ShieldGemma safety classifiers and Gemma Scope interpretability tools

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects

75.2%

HumanEvalCode generation pass@1 — Python problems

51.8%

Key Innovations

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Distillation

DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Built On

Gemma

Lineage

Gemma→Gemma 2

Successors (3)

Gemma 3 Gemma 2 Abliterated ShieldGemma

Related Research (1)

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

External Links

Research Paper Announcement

More from Google Gemma

Gemma2024-02 · 2B / 7B

Gemma 32025-03 · 1B / 4B / 12B / 27B

Gemma 42026-04 · 31B (A4B / E4B / E2B)

CodeGemma2024-04 · 2B / 7B

PreviousCodeGemma

NextGemma 3