Gemma
Google DeepMind · February 2024
● activeOpen Sourcedecoder onlytext
Parameters2B / 7B
Context Window8K tokens
Variants2B, 7B
Why It Matters
Google's first serious open-weight competitor to Meta's Llama. Demonstrated that a major lab could release small, efficient models that outperformed many larger open alternatives.
Description
Google's first open-weight model family, built using the same research and technology behind the proprietary Gemini models but released for anyone to download, modify, and deploy. Available in 2B and 7B parameter sizes — small enough to run on a laptop or single GPU. Designed to make frontier AI research accessible to the broader developer community.
Notable Milestones
- ▸Widely adopted on Hugging Face for fine-tuning and research
- ▸Ran efficiently on consumer hardware including laptops
Key Innovations
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.
Related Research (1)
Grouped-Query AttentionArchitecture
2023 · Google Research
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…