PaLM-E

Google · March 2023

activeClosedmultimodal
Parameters562B

Why It Matters

Demonstrated that scaling up embodied language models enables transfer of knowledge across different robot embodiments and tasks, showing positive transfer from web-scale language and vision data.

Description

Google's 562-billion-parameter embodied multimodal model that combines PaLM's language understanding with visual and sensor inputs for robotic planning. The largest vision-language model at the time of release, capable of understanding scenes and generating plans for robots to execute.

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
robotics
embodied

Family Tree

Built On

Lineage

PaLMPaLM-E

External Links

More from Robotics / Embodied