DALL·E

OpenAI · January 2021

☠ discontinuedCloseddecoder onlyimage

Parameters12B

Why It Matters

Proved that a single neural network could learn the relationship between text and images well enough to create novel images from descriptions, launching the text-to-image revolution.

Description

OpenAI's first text-to-image model, combining two AI techniques: a variational autoencoder (which learns to compress and reconstruct images) with a transformer (the same architecture behind GPT) to generate images from text descriptions. At 12 billion parameters, it showed that language models could be adapted to create visual art.

Notable Milestones

▸First demonstration of creative AI image generation from text
▸Inspired a wave of text-to-image research across the industry

Key Innovations

Text-to-Image

Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.

Autoregressive

AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Successors (1)

DALL·E 2

Related Research (2)

DALL·EDiffusion

2021 · OpenAI

Demonstrated that a single model could generate diverse, creative images from arbitrary text descriptions, combining language understanding with image…

CLIPTransformer

2021 · OpenAI

Trained a model to understand both images and text by learning which image-text pairs go together from 400 million internet examples. This created a s…

External Links

Research Paper Announcement

More from Image Generation

DALL·E 22022-04 · 3.5B

DALL·E 32023-10 · —

Flux.12024-08 · 12B

GPT-Image-12025-03 · —

Runway Gen-3 Alpha2024-06 · —

Pika2023-06 · —

Kling2024-06 · —

Luma Dream Machine2024-06 · —

NextDALL·E 2