NVLM 1.0
NVIDIA · October 2024
● activeOpen Weightdecoder onlymultimodal
Parameters72B
Context Window4K tokens
Description
A family of multimodal language models from NVIDIA that can process both text and images. Uniquely, adding vision capabilities actually improved the model's text performance — a rare achievement, since most multimodal models sacrifice some text quality when learning to handle images.
Key Innovations
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.