conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/dinov2/lib/" conda env config vars set CPATH="/home/cyl/miniconda3/envs/dinov2/include/" conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/dinov2/"
Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. 例如针对家用场景:
OVMM Challenge: https://aihabitat.org/challenge/2023_homerobot_ovmm/ 想要在这样复杂场景中执行 general, long-horizon, embodied tasks 需要生成一系列离散的动作,这些动作在都拥有累计和传播错误的可能。因此需要创建一个可行的计划并在该计划出现问题时恢复,需要对物理环境进行有效的抽象以及能够完全利用该抽象的planner。应对这些挑战需要整合自然语言理解,多粒度的场景抽象和理解以及有弹性的推理。
目前粗粒度(object-level)的场景抽象(场景图构建)已经有许多工作了,详见Reconstruct-Anything Literature Review,在这些工作中,重点都在于object detection和 object-level visual relationship detection
export DATASET=/pth/to/dataset # path to your coco data
一些绊脚石 ^ ^
1
根据[[Cuda+Torch]],需要先安装cudatoolkit和cuda-toolkit
1 2 3 4 5
conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 conda install cudatoolkit # no need to specify version conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/<name>/lib/" conda env config vars set CPATH="/home/cyl/miniconda3/envs/<name>/include/"# `/usr/include`for missing `crypt.h` conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/<name>/"
如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status 报错,只是因为没有安装cudatoolkit ^ ^
3
安装完成后直接import semantic_sam会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention' ^ ^ 提示:
1 2 3
Please compile MultiScaleDeformableAttention CUDA op with the following commands: `cd mask2former[/modeling/pixel_decoder/ops](http://127.0.0.1:8888/modeling/pixel_decoder/ops)` `sh make.sh`
需要手动make一下 Mask2Former:
1 2
cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ sh make.sh
The first contribution of this paper is to propose a task-driven 3D scene understanding problem, where the robot is given a list of tasks in natural language, and has to select the granularity and the subset of objects and scene structure to retain in its map that is sufficient to complete the tasks.
The second contribution is an algorithm for task-driven 3D scene understanding based on an Agglomerative IB approach, that is able to cluster 3D primitives in the environment into taskrelevant objects and regions
The architecture of RLSV is a three-layered hierarchical projection that projects a visual triple onto the attribute space, the relation space, and the visual space in order.