Chen Yulin's Blog

Posted 2025-03-31Updated 2025-08-15Note2 minutes read (About 236 words)

Repository:

Installation

official:

1	conda env create -f conda.yaml

不建议使用official的conda.yaml, 使用更改后的conda_cyl.yaml。

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0
conda install cudatoolkit
conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/dinov2/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/dinov2/include/"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/dinov2/"

export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# check with `which g++`

conda env update -f conda_cyl.yaml

pip3 install -U xformers==0.0.18

conda env config vars set PYTHONPATH="/home/cyl/Reconst/dinov2/"

Demo 🐱

官方提供了 depth estimation 和 segmentation 的 notebook，可以找时间理解一下

Train

使用的数据集为Imagenet-mini

imagenet-mini
├── labels.txt
├── train
└── val

Note: 需要额外添加一个label.txt

使用脚本生产数据集的meta data:

Posted 2025-03-25Updated 2025-08-15Note5 minutes read (About 724 words)

(Roadmap) Deeper Scene Graph For Robots

针对的问题（任务场景）

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment.
例如针对家用场景：

OVMM Challenge: https://aihabitat.org/challenge/2023_homerobot_ovmm/
想要在这样复杂场景中执行 general, long-horizon, embodied tasks 需要生成一系列离散的动作，这些动作在都拥有累计和传播错误的可能。因此需要创建一个可行的计划并在该计划出现问题时恢复，需要对物理环境进行有效的抽象以及能够完全利用该抽象的planner。应对这些挑战需要整合自然语言理解，多粒度的场景抽象和理解以及有弹性的推理。

目前粗粒度（object-level）的场景抽象（场景图构建）已经有许多工作了，详见Reconstruct-Anything Literature Review，在这些工作中，重点都在于object detection和 object-level visual relationship detection

需要聚焦的部分是多粒度的场景抽象
需要多粒度的原因：

Scalability: 如果只有一个粒度，那么输入LLM的场景图token不可控，影响扩展性
想要和物体进行更复杂的交互（相较于抓取），需要明确物体各个part的位置，语义性质，和父物体的parent-child relationship。这就要求场景图的生成需要考虑更细粒度。
针对不同复杂度的物体，需要的物体粒度层级不同
对于不同任务，需要的物体粒度也不同。
具体案例（任务需要的颗粒度层次）：
<Task>给水壶加水：
- <object-level>水壶
  - <part-level>壶盖
  - <part-level>把手
- <object-level>饮水机
  - <part-level>操作面板
    - <part-level>绿色按钮（常温水）
    - <part-level>红色按钮（开水）
    - <part-level>童锁
  - <part-level>水槽
- <object-level>桌子
  - <part-level>桌面
<Task>离开房间
- <object-level>门
  - <part-level>把手
  - <part-level>纸条：“离开房间前把玩偶放回红筐”
- <object-level>黄鸭玩偶
- <object-level>红框

在更细粒度（part-level）的场景抽象中，重点在于子物体和父物体关系的识别

除此，和object-level scene graph中的object detection相对的，是part-level scene graph的子物体语义的多粒度分割和语义信息提取，可以由现有的Semantic-SAM和类似CLIP或者其他多模态模型的语义特征提取器实现。

主要的研究流程

明确研究对象Parent-child Relationship

What aspects does parent-child relationship include?

语义构成关系，即这个子物体的存在与否给父物体的语义带来了什么改变 Translation in embedding space.
kinematic relations，也就是需要把一个物体以一个运动学树的形式构建出来

项目流程的流程

自监督的特征提取方法

Posted 2025-03-25Updated 2025-08-15Notea few seconds read (About 0 words)

ConceptAgent= LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

Posted 2025-03-24Updated 2025-08-15Note4 minutes read (About 539 words)

Semantic-SAM Repository Application

My repository: https://github.com/Chen-Yulin/Semantic-SAM
My venv: ssam

Installation

测试过的python版本：3.8,3.10
官方步骤：

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt

export DATASET=/pth/to/dataset  # path to your coco data

一些绊脚石 ^ ^

1

根据[[Cuda+Torch]]，需要先安装cudatoolkit和cuda-toolkit

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 
conda install cudatoolkit # no need to specify version
conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/<name>/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/<name>/include/" # `/usr/include`for missing `crypt.h`
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/<name>/"

然后按照torch官网的安装指令：

1	conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

2

第二行直接运行可能会报错，提示系统gcc版本过高，安装gcc=11.2.0

conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

# 指定编译器路径
export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# 找不到crypt.h的情况
sudo pacman -S libxcrypt-compat

export CXXFLAGS="${CXXFLAGS} -fuse-ld=/usr/bin/ld"

如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status 报错，只是因为没有安装cudatoolkit ^ ^

3

安装完成后直接import semantic_sam会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention' ^ ^
提示：

1
2
3

Please compile MultiScaleDeformableAttention CUDA op with the following commands:
	`cd mask2former[/modeling/pixel_decoder/ops](http://127.0.0.1:8888/modeling/pixel_decoder/ops)`
	`sh make.sh`

需要手动make一下 Mask2Former:

1 2	cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ sh make.sh

4

一些版本问题

1 2	pip install gradio==3.37.0 pip install matplotlib==3.7.0

Demo 🐱

Generate multi-granularity Mask on CLICK

1	python demo.py --ckpt ./weights/swinl_only_sam_many2many.pth

Comment: 效果相较于SAM更多体现了语义的一致性，而不是基于texture进行分割。

Automatically Generate Mask on Different Granularity

1	python demo_auto_generation.py --ckpt ./weights/swinl_only_sam_many2many.pth

需要解决的问题

同level下mask重合

Solved by `utils.psg_utils.mask.discard_submask` in `psg_data.segment_pipeline`

Part-seg-dataset Generation

效果

原图：

.png)

Instance identification:

Part segmentation

Posted 2025-03-19Updated 2025-08-15Reviewa few seconds read (About 19 words)

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

很重要的编码优化论文，MHE的概念：

Posted 2025-03-19Updated 2025-08-15Reviewa few seconds read (About 42 words)

ConceptGraphs= Open-Vocabulary 3D Scene Graphs for Perception and Planning

通过LLM来判断位置关系，以此构建scene graph

还是只能判断object-level空间关系，做不了part-level manipulation

Posted 2025-03-18Updated 2025-08-15Review2 minutes read (About 355 words)

SayPlan= Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

主要的思想都在上面这个伪代码里，通过只展开部分场景图（严格层级结构），来控制输入llm的场景图大小。

A scalable approach to ground LLM-based task planners across environments spanning multiple rooms and floors

Scene Graph 通过networkx (python package)表示

Three key innovations:

通过Collapsed 3DSG来在少数根节点上寻找task-relevant子图（后续通过展开子图进行进一步的搜寻），提高了scalability（避免过于复杂的整体场景图超过LLM的token限制）
环境中任务计划的horizon会随着给定任务的复杂性而增长，LLM会倾向于产生幻觉或者不可行的动作序列。所以通过成熟的path planner such as Dijkstra来连接high-level nodes。
An iterative replanning pipeline in order to correct for any unexecutable actions
- Missing to open the fridge before putting something into it
  因此，避免由于环境本身的物理限制和谓词的矛盾，幻觉或不一致而导致的计划失败。

Insight

每一次场景节点的展开与否，该节点是否是任务关注的节点都是由LLM决定的，这一点和我的想法一致。等于是将LLM作为一个检查器一层层遍历查找任务的兴趣点。
Scene Graph Simulator作为任务是否可行的验证器。

Posted 2025-03-18Updated 2025-08-15Reviewa minute read (About 197 words)

Clio= Real-time Task-Driven Open-Set 3D Scene Graphs

贡献：

The first contribution of this paper is to propose a task-driven 3D scene understanding problem, where the robot is given a list of tasks in natural language, and has to select the granularity and the subset of objects and scene structure to retain in its map that is sufficient to complete the tasks.
The second contribution is an algorithm for task-driven 3D scene understanding based on an Agglomerative IB approach, that is able to cluster 3D primitives in the environment into taskrelevant objects and regions
基于以上，实现了一个实时的pipeline

提出了针对不同任务需要不同粒度的语义信息，本文是通过结合SAM和[[CLIP多模态预训练模型]]实现，但是忽略了物体之间的谓语关系或者父子关系。本质还是智能做导航，拾取，放下，导航的基本操作。

Posted 2025-03-18Updated 2025-08-15Reviewa few seconds read (About 3 words)

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Posted 2025-03-18Updated 2025-08-15Reviewa few seconds read (About 31 words)

Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding

The architecture of RLSV is a three-layered hierarchical projection that projects a visual triple onto the attribute space, the relation space, and the visual space in order.

Installation

Demo 🐱

Train

针对的问题（任务场景）

主要的研究流程

明确研究对象Parent-child Relationship

项目流程的流程

自监督的特征提取方法

Installation

一些绊脚石 ^ ^

1

2

3

4

Demo 🐱

Generate multi-granularity Mask on CLICK

Automatically Generate Mask on Different Granularity

需要解决的问题

同level下mask重合

Part-seg-dataset Generation

效果

Insight

Archives

Recents

Tags