
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
SayPlan= Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
主要的思想都在上面这个伪代码里,通过只展开部分场景图(严格层级结构),来控制输入llm的场景图大小。
A scalable approach to ground LLM-based task planners across environments spanning multiple rooms and floors
Scene Graph 通过networkx (python package)表示
Three key innovations:
Collapsed 3DSG
来在少数根节点上寻找task-relevant子图(后续通过展开子图进行进一步的搜寻),提高了scalability(避免过于复杂的整体场景图超过LLM的token限制)Scene Graph Simulator
作为任务是否可行的验证器。Clio= Real-time Task-Driven Open-Set 3D Scene Graphs
贡献:
提出了针对不同任务需要不同粒度的语义信息,本文是通过结合SAM和[[CLIP多模态预训练模型]]实现,但是忽略了物体之间的谓语关系或者父子关系。本质还是智能做导航,拾取,放下,导航的基本操作。
Factorizable Net= An Efficient Subgraph-based Framework for Scene Graph Generation
我的想法是将场景进行panoptic segmentation 之后再在每个物体上进行hierarchical part relation detection,异曲同工。