BAGEL-Unified-Multimodal-Pretraining

BAGEL-Unified-Multimodal-Pretraining

LingBot-VLA

LingBot-VLA

UniDiffuser

UniDiffuser

GR00T N1 An Open Foundation Model for Generalist Humanoid Robots

GR00T N1 An Open Foundation Model for Generalist Humanoid Robots

Vision-Language Interpreter for Robot Task Planning

Vision-Language Interpreter for Robot Task Planning

Pixtral 12B
From Pixels to Graphs= Open-Vocabulary Scene Graph Generation with  Vision-Language Models
OMG-LLaVA

OMG-LLaVA

BLIP

BLIP

GLIP

GLIP