Gemini 2.0 Flash

Google · December 2024

activeCloseddecoder onlymultimodalAPI Available

Why It Matters

First production model to combine real-time multimodal I/O with autonomous agent capabilities, pointing toward AI systems that can see, hear, speak, and act.

Description

A natively multimodal model that can process and generate text, images, audio, and video in real-time. The first Gemini model with built-in tool use (the ability to call external APIs and services) and agentic capabilities (the ability to autonomously plan and execute multi-step tasks). Also includes steerable text-to-speech — voice generation where you can control the tone and style.

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Tool Use
Tool UseAbility to call external tools, APIs, and functions — enabling web browsing, code execution, and real-world actions.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.

Family Tree

Built On

Lineage

PaLMPaLM 2Gemini 1.0Gemini 1.5 ProGemini 2.0Gemini 2.0 Flash

External Links