o3

OpenAI · April 2025

activeCloseddecoder onlytextAPI Available
Context Window200K tokens
Variantso3, o3-mini, o3-pro

Why It Matters

Set new records on virtually every major reasoning benchmark. The o3-pro variant demonstrated that scaling test-time compute could achieve near-human expert performance on the most challenging scientific and mathematical problems.

Description

OpenAI's most powerful reasoning model, significantly surpassing o1 on math, coding, and science benchmarks. The Pro variant uses even more compute per query for the hardest problems. Represents the state of the art in AI reasoning at the time of release.

Notable Milestones

  • Achieved a new high score on ARC-AGI, a benchmark designed to test general reasoning
  • Outperformed PhD-level experts on graduate science exams
  • Set state-of-the-art on competitive math olympiad problems

Benchmark Scores

GPQAGraduate-level science QA
87.7%
AIMEAMC/AIME math competition
96.7%
SWE-benchReal-world software engineering
71.7%

Key Innovations

Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
Test-Time Compute
Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.

Family Tree

Built On

Related Research (1)

2022 · Google

Showed that prompting models to "think step-by-step" unlocks arithmetic, logic, and commonsense reasoning in large models like PaLM.

External Links

More from OpenAI o-series