LLaMA 3.2
Meta · September 2024
Why It Matters
Brought multimodal capabilities to the open-source LLaMA ecosystem for the first time, and introduced tiny models optimized for on-device deployment on smartphones.
Description
First LLaMA models that can understand images alongside text. Includes lightweight 1B and 3B text-only models designed to run on phones and edge devices, plus 11B and 90B multimodal models that can analyze photos, charts, and documents. Marked Meta's entry into the vision-language model space.
Notable Milestones
- ▸First open LLaMA models with image understanding
- ▸1B/3B models designed to run on mobile phones
- ▸Enabled on-device AI without cloud connectivity
Key Innovations
Related Research (2)
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…