LLaMA 3.2

Meta · September 2024

activeOpen Weightdecoder onlymultimodalAPI Available
Parameters1B / 3B / 11B / 90B
Context Window128K tokens
Variants1B, 3B, 11B-Vision, 90B-Vision

Why It Matters

Brought multimodal capabilities to the open-source LLaMA ecosystem for the first time, and introduced tiny models optimized for on-device deployment on smartphones.

Description

First LLaMA models that can understand images alongside text. Includes lightweight 1B and 3B text-only models designed to run on phones and edge devices, plus 11B and 90B multimodal models that can analyze photos, charts, and documents. Marked Meta's entry into the vision-language model space.

Notable Milestones

  • First open LLaMA models with image understanding
  • 1B/3B models designed to run on mobile phones
  • Enabled on-device AI without cloud connectivity

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Built On

Lineage

LLaMALLaMA 2LLaMA 3LLaMA 3.1LLaMA 3.2

Successors (1)

Related Research (2)

RoPEArchitecture
2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

External Links