DeepSeek V1

DeepSeek · January 2024

activeOpen Weightdecoder onlytext
Parameters67B
Context Window4K tokens
Variants7B, 67B

Why It Matters

Introduced DeepSeek as a serious open-source AI contender from China, demonstrating that frontier-quality models could come from outside Silicon Valley.

Description

The debut model from DeepSeek, a Chinese AI lab backed by the quantitative trading firm High-Flyer. Available in 7B and 67B sizes, utilizing a dense decoder-only transformer design. Strong at coding and math tasks, and trained on 2 trillion tokens of data, it signaled a new serious competitor in the open-source AI space.

Key Innovations

Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.

Family Tree