LLM Treeof Life

Gemini 2.0 Flash

Google · December 2024

● activeCloseddecoder onlymultimodalAPI Available

Why It Matters

First production model to combine real-time multimodal I/O with autonomous agent capabilities, pointing toward AI systems that can see, hear, speak, and act.

Description

A natively multimodal model that can process and generate text, images, audio, and video in real-time. The first Gemini model with built-in tool use (the ability to call external APIs and services) and agentic capabilities (the ability to autonomously plan and execute multi-step tasks). Also includes steerable text-to-speech — voice generation where you can control the tone and style.

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Tool Use

Tool UseAbility to call external tools, APIs, and functions — enabling web browsing, code execution, and real-world actions.

Agentic

AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.

Family Tree

Built On

Lineage

PaLM→PaLM 2→Gemini 1.0→Gemini 1.5 Pro→Gemini 2.0→Gemini 2.0 Flash

External Links

More from Google Gemini

Gemini 1.02023-12 · —

Gemini 1.5 Pro2024-02 · —

Gemini 2.02024-12 · —

Gemini 2.5 Pro2025-03 · —

Gemini 3.1 Pro2026-02 · ~1T (MoE)

Gemini 3.5 Flash2026-05 · —

Gemini 3.5 Pro2026-05 · —

Imagen 22023-12 · —

Imagen 32024-06 · —

Veo 22024-12 · —

PreviousGemini 2.0