StarCoder2

BigCode / Hugging Face · February 2024

activeOpen Sourcedecoder onlycode
Parameters3B - 15B
Context Window16K tokens
Variants3B, 7B, 15B

Description

The successor to StarCoder, trained on The Stack v2 — an even larger and more diverse code dataset spanning over 600 programming languages. Available in 3B, 7B, and 15B sizes, it improved on its predecessor across coding benchmarks while maintaining the same commitment to transparent, ethically sourced training data.

Key Innovations

Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Built On

Lineage

StarCoderStarCoder2

External Links