MusicGen
Meta · June 2023
● activeOpen Weightdecoder onlyaudio
Parameters3.3B
Variantssmall, medium, large
Why It Matters
Showed that the same transformer architecture powering chatbots could also compose music, opening the door to AI-generated soundtracks and compositions.
Description
Meta's music generation model that creates high-quality stereo music from text descriptions (like 'an upbeat jazz tune with piano and saxophone'). Uses the same autoregressive approach as language models — generating music one audio token at a time — but in a single stage rather than requiring multiple processing steps. Can also transform an existing melody into a different style.
Key Innovations
Text-to-Audio
Text-to-AudioGenerating speech, music, or sound effects from text descriptions.
Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.