Gemini 1.5 Pro
Google DeepMind · February 2024
● activeClosedmixture of expertsmultimodalAPI Available
Context Window2M tokens
VariantsPro, Flash
Why It Matters
Shattered context window records with 1M tokens (later 2M), enabling entirely new use cases like analyzing full codebases or hour-long videos in a single prompt. Proved that mixture-of-experts could enable practical ultra-long-context processing.
Description
Broke the context window barrier with a 1-million-token context — enough to process entire codebases, hour-long videos, or several novels in a single prompt. Used a mixture-of-experts architecture (where only a fraction of the model activates for each query) to handle this massive input efficiently. Later expanded to 2 million tokens.
Notable Milestones
- ▸First model to process 1 million tokens of context
- ▸Can analyze hour-long videos and full codebases in one prompt
- ▸Flash variant became one of the most cost-effective frontier models
Benchmark Scores
MMLUMassive Multitask Language Understanding — 57 subjects
85.9%MATHMATH benchmark — competition-level problems
86.5%GPQAGraduate-level science QA
59.1%Key Innovations
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Related Research (1)
GeminiScaling
2023 · Google DeepMind
Introduced the Gemini family with native multimodal training from the ground up, achieving SOTA on 30+ benchmarks.
Enabled By
TPU v5e / v5pGOOGLE · August 2023
v5e: 197 TFLOPS FP8 / v5p: 459 TFLOPS bfloat16