Whisper
OpenAI · September 2022
● activeOpen Weightencoder decoderaudio
Parameters1.5B
Variantstiny, base, small, medium, large, large-v3
Why It Matters
Democratized speech recognition by releasing a model that rivals expensive commercial transcription services, enabling thousands of applications from meeting transcription to accessibility tools.
Description
Open-source speech recognition model trained on 680,000 hours of multilingual audio from the internet. Approaches human-level accuracy at transcribing speech across dozens of languages, different accents, and even noisy environments. Available in sizes ranging from tiny (39M parameters) to large (1.5B), making it usable on everything from phones to servers.
Notable Milestones
- ▸Powers transcription in countless apps and services
- ▸Supports 99 languages with automatic language detection
- ▸Widely used for podcast and video transcription
Key Innovations
Speech Recognition
Speech RecognitionConverting spoken audio into text (automatic speech recognition / ASR).
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.
Related Research (1)
WhisperTransformer
2022 · OpenAI
Trained a speech recognition model on 680,000 hours of multilingual audio from the internet, achieving near-human accuracy across 97 languages without…