DALL·E

OpenAI · January 2021

discontinuedCloseddecoder onlyimage
Parameters12B

Why It Matters

Proved that a single neural network could learn the relationship between text and images well enough to create novel images from descriptions, launching the text-to-image revolution.

Description

OpenAI's first text-to-image model, combining two AI techniques: a variational autoencoder (which learns to compress and reconstruct images) with a transformer (the same architecture behind GPT) to generate images from text descriptions. At 12 billion parameters, it showed that language models could be adapted to create visual art.

Notable Milestones

  • First demonstration of creative AI image generation from text
  • Inspired a wave of text-to-image research across the industry

Key Innovations

Text-to-Image
Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.
Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Successors (1)

Related Research (2)

DALL·EDiffusion
2021 · OpenAI

Demonstrated that a single model could generate diverse, creative images from arbitrary text descriptions, combining language understanding with image…

CLIPTransformer
2021 · OpenAI

Trained a model to understand both images and text by learning which image-text pairs go together from 400 million internet examples. This created a s…