PaLM-E

Google · March 2023

● activeClosedmultimodal

Parameters562B

Why It Matters

Demonstrated that scaling up embodied language models enables transfer of knowledge across different robot embodiments and tasks, showing positive transfer from web-scale language and vision data.

Description

Google's 562-billion-parameter embodied multimodal model that combines PaLM's language understanding with visual and sensor inputs for robotic planning. The largest vision-language model at the time of release, capable of understanding scenes and generating plans for robots to execute.

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

robotics

embodied

Family Tree

Built On

PaLM

Lineage

PaLM→PaLM-E

External Links

Research Paper

More from Robotics / Embodied

RT-22023-07 · —

NextRT-2