DeepSeek V1
DeepSeek · January 2024
● activeOpen Weightdecoder onlytext
Parameters67B
Context Window4K tokens
Variants7B, 67B
Why It Matters
Introduced DeepSeek as a serious open-source AI contender from China, demonstrating that frontier-quality models could come from outside Silicon Valley.
Description
The debut model from DeepSeek, a Chinese AI lab backed by the quantitative trading firm High-Flyer. Available in 7B and 67B sizes, utilizing a dense decoder-only transformer design. Strong at coding and math tasks, and trained on 2 trillion tokens of data, it signaled a new serious competitor in the open-source AI space.
Key Innovations
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.