OK-Robot- What Really Matters in Integrating Open-Knowledge Models for Robotics
Creating a general-purpose robot has been a longstanding dream of the robotics community.
当前想要实现这一目标的系统脆弱、封闭,并且在遇到未见过的情况时会失败。即使是最大的机器人模型通常也只能部署在以前见过的环境中 [5, 6]。在机器人数据很少的环境中,例如在非结构化的家庭环境中,这些系统的脆弱性会进一步加剧。
虽然大型视觉模型显示出语义理解 、检测以及将视觉表示与语言联系起来的能力并且与此同时,机器人的导航、抓取和重新排列等基本机器人技能已经相当成熟。
但是将现代视觉模型与机器人特定基元相结合的机器人系统表现非常差。
这可能是因为单纯将多个不确定性的系统组合在一起会导致准确率急剧恶化。
所以我们需要一个将VLM和机器人primitives(导航,抓取,放置)结合在一起的细致框架,即OK-Robot。
Pick up A (from B) and drop it on/in C”, where A is an object and B and C are places in a real-world environment such as homes
负责空间重建,识别物体大致位置,机器人导航
用到的方法:
ACDC- Automated Creation of Digital Cousins for Robust Policy Learning
数字孪生(DT)作为现实世界非常精确的映射虽然可以用于高精度的训练但是生产DT资产过于繁琐且没有泛化性,不能做到zero-shot。
数字表亲(DC)通过比对模型特征,从模型库中选择类似的表亲模型,用于重建场景训练机械臂。让机械臂针对不同第一次见的场景具有泛化性。
(a)它减少了手动微调的需要,以保证一定的保真度,从而能够完全自动化地创建数字表亲,(b)它通过提供一组增强的场景来训练机器人策略,从而有助于更好地应对原始场景中的变化。
ACDC is our automated pipeline for generating fully interactive simulated scenes from a single RGB image, and is broken down into three steps:
(1) an extraction step, in which relevant object masks are extracted from the raw input image
(2) a matching step, in which we select digital cousins for individual objects extracted from the original scene
(3) a generation step, in which the selected digital cousins are post-processed and compiled together to form a fully-interactive, physically-plausible digital cousin scene.
CosyPose-- Consistent multi-view multi-object 6D pose estimation
Estimate accurate 6D poses of multiple known objects in a 3D scene captured by multiple cameras with unknown positions
仓库:https://github.com/liuyuan-pal/Gen6D
手册:https://github.com/liuyuan-pal/Gen6D/blob/main/custom_object.md
步骤指令:
1 | python prepare.py --action video2image --input data/custom/part1/ref.mp4 --output data/custom/part1/images --frame_inter 10 --image_size 960 |
关于判定不准确怎么解决:https://github.com/liuyuan-pal/Gen6D/issues/29
unity 使用左手坐标系,普遍的 6d 算法使用右手坐标系,所以得出[R;t]后需要做一步针对 y 轴的反射变换
1 | def right_to_left_hand_pose_R(R): |
可以看到效果很好:
State of The Art: Foundation Pose (https://github.com/NVlabs/FoundationPose)
CASAPose (https://github.com/fraunhoferhhi/casapose?tab=readme-ov-file)
MegaPose (https://github.com/megapose6d/megapose6d)
MegaPose (https://github.com/megapose6d/megapose6d)
OVE6D (https://github.com/dingdingcai/OVE6D-pose)