LLaMA 3.2

Meta · September 2024

● activeOpen Weightdecoder onlymultimodalAPI Available

Parameters1B / 3B / 11B / 90B

Context Window128K tokens

Variants1B, 3B, 11B-Vision, 90B-Vision

Why It Matters

Brought multimodal capabilities to the open-source LLaMA ecosystem for the first time, and introduced tiny models optimized for on-device deployment on smartphones.

Description

First LLaMA models that can understand images alongside text. Includes lightweight 1B and 3B text-only models designed to run on phones and edge devices, plus 11B and 90B multimodal models that can analyze photos, charts, and documents. Marked Meta's entry into the vision-language model space.

Notable Milestones

▸First open LLaMA models with image understanding
▸1B/3B models designed to run on mobile phones
▸Enabled on-device AI without cloud connectivity

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Related Research (2)

RoPEArchitecture

2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…