OLMo

Allen Institute for AI · February 2024

● activeOpen Sourcedecoder onlytext

Parameters7B

Context Window2K tokens

Why It Matters

The most transparently open LLM ever released — published not just weights but the ENTIRE training pipeline: data (Dolma), code, training logs, evaluation. Set a new standard for open science in AI.

Description

The most transparently open large language model ever released. Allen AI published not just the model weights but the entire training pipeline: the Dolma training dataset, all training code, training logs, and evaluation framework. This level of openness gave researchers unprecedented ability to study, reproduce, and build upon a state-of-the-art language model.

Notable Milestones

▸Full training pipeline released (data, code, logs, evaluations)
▸Dolma dataset became a benchmark for open training data

Key Innovations

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Successors (2)

OLMo 2 Molmo

External Links

Research Paper

More from Allen AI

OLMo 22024-11 · 7B - 13B

Molmo2024-09 · 7B - 72B

Tülu 32024-11 · 8B - 70B

NextMolmo