Scene-LLM
3D-LLM

3D-LLM

Recent works have explored aligning images and videos with LLM for a new generation of multi-modal LLMs that equip LLMs with the ability to understand and reason about 2D images.
但是仍缺少对于3D物理空间进行分析的模型, which involves richer concepts such as spatial relationships, affordances, physics and interaction so on.

PointLLM
LERF- Language Embedded Radiance Fields
OK-Robot- What Really Matters in Integrating Open-Knowledge  Models for Robotics
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation