LLM Treeof Life

Imagen 3

Google DeepMind · June 2024

● activeCloseddiffusionimage

Description

Google's highest quality text-to-image model. Produces photorealistic images with exceptional detail, richer lighting, and significantly fewer visual artifacts (distortions or errors) than its predecessor, especially for complex scenes with multiple objects.

Key Innovations

Diffusion

DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.

Text-to-Image

Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.

Family Tree

Built On

Lineage

Imagen 2→Imagen 3

Related Research (1)

ImagenDiffusion

2022 · Google Brain

Demonstrated that large frozen text encoders (T5-XXL) with cascaded diffusion models produce photorealistic images, outperforming DALL·E 2.

External Links

More from Google Gemini

Gemini 1.02023-12 · —

Gemini 1.5 Pro2024-02 · —

Gemini 2.02024-12 · —

Gemini 2.5 Pro2025-03 · —

Gemini 3.1 Pro2026-02 · ~1T (MoE)

Gemini 3.5 Flash2026-05 · —

Gemini 3.5 Pro2026-05 · —

Imagen 22023-12 · —

Gemini 2.0 Flash2024-12 · —

Veo 22024-12 · —

PreviousGemini 1.5 Pro