Phi-2

Microsoft Research · December 2023

● activeOpen Sourcedecoder onlytext

Parameters2.7B

Context Window2K tokens

Why It Matters

Microsoft's proof that small models trained on high-quality data could outperform models 25x their size. Fundamentally challenged the assumption that bigger always means better.

Description

A 2.7 billion parameter model that matched or outperformed models 5-10x its size on reasoning and language benchmarks. Built on the same philosophy as Phi-1 — using carefully selected, high-quality training data instead of brute-force scale. Proved that small models could rival much larger ones when trained smartly.