Gemini 3.5 Flash
Google DeepMind · May 2026
● activeClosedmixture of expertsmultimodalAPI Available
Context Window1M tokens
Description
Google's speed-optimized model designed for high-volume, low-latency applications. Serves as the default model in the Gemini app, balancing strong capabilities with rapid response times and low cost per query.
Key Innovations
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.