Phi-1
Microsoft Research · June 2023
● activeOpen Sourcedecoder onlycode
Parameters1.3B
Context Window2K tokens
Why It Matters
Launched Microsoft's 'small but mighty' research program, demonstrating that training data quality could be more important than model size — a finding that influenced the entire field.
Description
A tiny 1.3 billion parameter model focused on code generation, trained exclusively on 'textbook-quality' data — carefully curated examples designed to teach programming concepts clearly. Despite being over 100x smaller than GPT-4, it performed surprisingly well on coding benchmarks, proving that what a model learns from matters as much as its size.
Notable Milestones
- ▸Proved textbook-quality data approach for code generation
- ▸Inspired the 'small language model' movement
Key Innovations
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.