Whisper

OpenAI · September 2022

● activeOpen Weightencoder decoderaudio

Parameters1.5B

Variantstiny, base, small, medium, large, large-v3

Why It Matters

Democratized speech recognition by releasing a model that rivals expensive commercial transcription services, enabling thousands of applications from meeting transcription to accessibility tools.

Description

Open-source speech recognition model trained on 680,000 hours of multilingual audio from the internet. Approaches human-level accuracy at transcribing speech across dozens of languages, different accents, and even noisy environments. Available in sizes ranging from tiny (39M parameters) to large (1.5B), making it usable on everything from phones to servers.

Notable Milestones

▸Powers transcription in countless apps and services
▸Supports 99 languages with automatic language detection
▸Widely used for podcast and video transcription

Key Innovations

Speech Recognition

Speech RecognitionConverting spoken audio into text (automatic speech recognition / ASR).

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Related Research (1)

WhisperTransformer

2022 · OpenAI

Trained a speech recognition model on 680,000 hours of multilingual audio from the internet, achieving near-human accuracy across 97 languages without…

External Links

Research Paper Announcement

More from OpenAI GPT

InstructGPT / text-davinci-0022022-01 · 175B

GPT-3.5 / ChatGPT2022-11 · 175B

GPT-42023-03 · ~1.7T (est. MoE)

GPT-4 Turbo2023-11 · ~1.7T (est. MoE)

GPT-4o2024-05 · —

GPT-4o Mini2024-07 · —

PreviousInstructGPT / text-davinci-002

NextGPT-3.5 / ChatGPT