Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
和我的想法非常相近,完成度也很高啊喂。可以参考他的实现思路,引用的文章等等。
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
和我的想法非常相近,完成度也很高啊喂。可以参考他的实现思路,引用的文章等等。
ACDC- Automated Creation of Digital Cousins for Robust Policy Learning
数字孪生(DT)作为现实世界非常精确的映射虽然可以用于高精度的训练但是生产DT资产过于繁琐且没有泛化性,不能做到zero-shot。
数字表亲(DC)通过比对模型特征,从模型库中选择类似的表亲模型,用于重建场景训练机械臂。让机械臂针对不同第一次见的场景具有泛化性。
(a)它减少了手动微调的需要,以保证一定的保真度,从而能够完全自动化地创建数字表亲,(b)它通过提供一组增强的场景来训练机器人策略,从而有助于更好地应对原始场景中的变化。
ACDC is our automated pipeline for generating fully interactive simulated scenes from a single RGB image, and is broken down into three steps:
(1) an extraction step, in which relevant object masks are extracted from the raw input image
(2) a matching step, in which we select digital cousins for individual objects extracted from the original scene
(3) a generation step, in which the selected digital cousins are post-processed and compiled together to form a fully-interactive, physically-plausible digital cousin scene.
CosyPose-- Consistent multi-view multi-object 6D pose estimation
Estimate accurate 6D poses of multiple known objects in a 3D scene captured by multiple cameras with unknown positions
仓库:https://github.com/liuyuan-pal/Gen6D
手册:https://github.com/liuyuan-pal/Gen6D/blob/main/custom_object.md
步骤指令:
1 | python prepare.py --action video2image --input data/custom/part1/ref.mp4 --output data/custom/part1/images --frame_inter 10 --image_size 960 |
关于判定不准确怎么解决:https://github.com/liuyuan-pal/Gen6D/issues/29
unity 使用左手坐标系,普遍的 6d 算法使用右手坐标系,所以得出[R;t]后需要做一步针对 y 轴的反射变换
1 | def right_to_left_hand_pose_R(R): |
可以看到效果很好:
State of The Art: Foundation Pose (https://github.com/NVlabs/FoundationPose)
CASAPose (https://github.com/fraunhoferhhi/casapose?tab=readme-ov-file)
MegaPose (https://github.com/megapose6d/megapose6d)
MegaPose (https://github.com/megapose6d/megapose6d)
OVE6D (https://github.com/dingdingcai/OVE6D-pose)