OLMo

Allen Institute for AI · February 2024

activeOpen Sourcedecoder onlytext
Parameters7B
Context Window2K tokens

Why It Matters

The most transparently open LLM ever released — published not just weights but the ENTIRE training pipeline: data (Dolma), code, training logs, evaluation. Set a new standard for open science in AI.

Description

The most transparently open large language model ever released. Allen AI published not just the model weights but the entire training pipeline: the Dolma training dataset, all training code, training logs, and evaluation framework. This level of openness gave researchers unprecedented ability to study, reproduce, and build upon a state-of-the-art language model.

Notable Milestones

  • Full training pipeline released (data, code, logs, evaluations)
  • Dolma dataset became a benchmark for open training data

Key Innovations

Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Successors (2)

External Links