DBRX

Databricks · March 2024

activeOpen Sourcemixture of expertstext
Parameters132B (36B active)
Context Window32K tokens

Why It Matters

Enterprise-grade open MoE model from Databricks — proved that data companies could build competitive LLMs with their proprietary data pipeline expertise.

Description

An enterprise-grade open-source Mixture-of-Experts model from Databricks, with 132 billion total parameters but only 36 billion active at any time (since MoE models only activate the most relevant 'expert' sub-networks for each input). Built leveraging Databricks' proprietary data pipeline expertise, it outperformed LLaMA 2 70B and Mixtral on many benchmarks while being more efficient to run.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

External Links