Chen Yulin's Blog

Posted 2025-04-16Updated 2025-08-15Notea few seconds read (About 3 words)

Vision-Language Interpreter for Robot Task Planning

Posted 2025-04-14Updated 2025-08-15Notea few seconds read (About 95 words)

(Mindmap) Part-level Scene Understanding for Robots

概念梳理

Scene Graph

A scene graph is a structural representation, which can capture detailed semantics by explicitly Modeling:

objects (‘‘man’’, ‘‘fire hydrant’’, ‘‘shorts’’)
attributes of objects (‘‘fire hydrant is yellow’’)
relations between paired objects (‘‘man jumping over fire hydrant’’)

A scene graph is a set of visual relationship triplets in the form of <subject, relation, object> or <object, is, attribute>

Scene graphs should serve as an **objective semantic representation** of the state of the scene

Posted 2025-04-11Updated 2025-08-15Notea minute read (About 166 words)

OmegaConf Python Package

docs

Start

1	pip install omegaconf

1	from omegaconf import OmegaConf

Create

You can create OmegaConf objects from multiple sources.

From `dict`

conf = OmegaConf.create({"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]})
print(OmegaConf.to_yaml(conf))

k: v
list:
- 1
- a: '1'
  b: '2'
  3: c

From `list`

conf = OmegaConf.create([1, {"a":10, "b": {"a":10, 123: "int_key"}}])
print(OmegaConf.to_yaml(conf))

- 1
- a: 10
  b:
    a: 10
    123: int_key

From `yaml`

conf = OmegaConf.load('source/example.yaml')
# Output is identical to the YAML file
print(OmegaConf.to_yaml(conf))

server:
  port: 80
log:
  file: ???
  rotation: 3600
users:
- user1
- user2

From `dot-list`

dot_list = ["a.aa.aaa=1", "a.aa.bbb=2", "a.bb.aaa=3", "a.bb.bbb=4"]
conf = OmegaConf.from_dotlist(dot_list)
print(OmegaConf.to_yaml(conf))

a:
  aa:
    aaa: 1
    bbb: 2
  bb:
    aaa: 3
    bbb: 4

From command line arguments

sys.argv = ['your-program.py', 'server.port=82', 'log.file=log2.txt']
conf = OmegaConf.from_cli()
print(OmegaConf.to_yaml(conf))

server:
  port: 82
log:
  file: log2.txt

Posted 2025-04-07Updated 2025-08-15Note4 minutes read (About 616 words)

SJTU-HPC基本操作

官方文档

`LMOD`软件管理

https://lmod.readthedocs.io/en/latest/010_user.html

交大hpc使用lmod来管理用户软件，每一次重新登陆都会重置module，需要将软件重新load。
常用指令：

ml # list these loaded modules
ml miniconda3 # load the miniconda3 module
ml -miniconda3 # unload the miniconda3 module
module spider # inspect the possible modules that can be loaded
module spider cuda # show available cuda versions

`oh-my-zsh`更好的shell

hpc默认已安装zsh, 通过omzsh的安装脚本进行安装，并安装插件

export CHSH=no  # 避免它自动切换默认 shell（HPC 通常不允许）
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions
git clone --depth 1 https://github.com/junegunn/fzf.git ~/.fzf
~/.fzf/install
git clone https://github.com/psprint/zsh-navigation-tools ~/.zsh-navigation-tools

## vim add configuration in `.zshrc`
plugins=(
  git
  zsh-navigation-tools
  rails
  ruby
  zsh-syntax-highlighting
  zsh-autosuggestions
)
source ${(q-)ZSH_CUSTOM:-$ZSH/custom}/plugins/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh

## :wq

`nvim` 更好的编辑器

因为在服务器上没有sudo权限所以需要手动编译

首先使用conda创建编译环境

1	conda create -n nvim-build cmake libtool gettext curl unzip ninja gcc gxx -y -c conda-forge

clone & build

git clone https://github.com/neovim/neovim.git
cd neovim
git checkout stable
make CMAKE_BUILD_TYPE=Release

Tricks

交互式计算资源

某些情况下需要进行并行编译，编译需要耗费大量资源，并不一定可以在登录节点使用：

在这种情况下使用提交作业的方式较为低效，可以申请交互式计算资源

1	srun -p cpu -n 4 --pty /bin/bash

example:

(base) chenyulin@pilogin2 ~> srun -p cpu -n 4 --pty /bin/bash
srun: job 43090290 queued and waiting for resources
srun: job 43090290 has been allocated resources
(base) chenyulin@cas242 ~> ls
condalist.txt  dinov2  log  test.slurm
(base) chenyulin@cas242 ~> cd dinov2        
(base) chenyulin@cas242 ~/dinov2> conda activate dinov2
(dinov2) chenyulin@cas242 ~/dinov2> export CC=$CONDA_PREFIX/bin/gcc                       main
export CXX=$CONDA_PREFIX/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> which g++                                             main
~/.conda/envs/dinov2/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> vim conda_cyl.yaml                                    main
(dinov2) chenyulin@cas242 ~/dinov2> conda env update -f conda_cyl.yaml                    main
Retrieving notices: ...working... done
Channels:
 - xformers
 - pytorch
 - nvidia
 - conda-forge
 - defaults
 - nvidia/label/cuda-11.7.0
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
Installing pip dependencies: |

文件传输

本地向云端传输

使用hpc studio传输
https://studio.hpc.sjtu.edu.cn/pun/sys/dashboard/files/fs//lustre/home/acct-umjbyy/chenyulin/Data

云端之间互相传输

例如拷贝conda环境

1	cp -r /lustre/home/acct-umjbyy/chenyulin/.conda /dssg/home/acct-umjbyy/chenyulin/

Posted 2025-04-02Updated 2025-08-15Notea few seconds read (About 16 words)

Tuxguitar on Archlinux

Installation

1
2
3

yay -S tuxguitar
yay -S jack2
yay -S libffado

No sound, need some fix

Posted 2025-04-01Updated 2025-08-15Note9 minutes read (About 1383 words)

自然辩证法分享之AI中的涌现

涌现的定义

西哲中的涌现

“涌现”（Emergence）最早起源于哲学领域。19世纪，英国哲学家乔治·亨利·刘易斯次用这个词来描述那些无法通过组成部分的性质解释的整体特性，即“整体大于部分之和”的现象。

将一切都归结为简单基本规律的能力并不意味着有能力从这些规律出发，重建宇宙。在面对规模和复杂度的双重困难时，还原主义假说便崩溃了。在每个复杂度层次上，都会出现全新的性质。心理学不是应用生物学，生物学也不是应用化学。我们现在可以看到，整体不仅变得更多，而且与部分的总和也非常不同。

复杂系统（计算机）中的涌现

系统科学家穆雷·盖尔曼和斯图尔特·考夫曼通过对复杂系统中自组织行为的研究，进一步发展了“涌现”的概念。他们认为，“涌现”指的是在复杂系统内，简单组件通过相互作用自然产生的某些特性或现象。这意味着，整体所展现的特征或行为不能简单地从其组成部分的性质中推导出来，而是在这些部分的相互作用中自发产生的。涌现现象侧重于解释复杂系统如何在没有外部指令或中央控制的情况下，通过系统内部的简单规则和相互作用，形成新的有序结构和行为。这表明，复杂性可以自然地产生，而不需要外部干预或预先的详细设计。

在这一方面最具代表性的就是生命游戏：
在生命游戏中：

规则极其简单（细胞的生死规则）。
初始状态可能看似随机。
但随着演化，整个系统会涌现出复杂的模式，如振荡器、滑翔机等。
![[game-of-life-loop-cropped.gif]]

AI中的涌现

关于文本自回归模型chatgpt参数提升带来的语言理解、推理、编程、创意生成等新能力的涌现我这里就不过多赘述了，我主要侧重分享一下在自监督视觉领域的涌现现象。

DINO（Self-Distillation with No Labels）是一种自监督学习方法，用于无标签学习视觉特征。它基于知识蒸馏（Distillation）和对比学习的思想，使用两组权重共享的网络（教师网络和学生网络）进行训练。

这个模型的思想是使用学生模型来预测教师模型生成的label，而教师模型的权重则使用学生模型的权重更新。其中学生们模型输入的图像是原始照片中的局部小区域的裁剪，而教师模型的输入是更大更全局的裁剪。这种自监督的方式将数据集从原先有标注的数据集的百万规模扩展到了无标注网络图片的数亿规模。在经过训练之后，这个模型能够实现那些起初并没有详细地针对性设计的功能 - 可以在没有做任何微调或者添加映射层的基础上在nearest neighbors classifier (k-NN)达到top-1准确率。 - 其次，在模型的最后一层self-attention layer上可以观察到显著的语义分割信息

## 体现的自然辩证法规律“量变产生质变” 在 DINO 这一自监督学习方法中，**“量变产生质变”**的自然辩证法规律得到了显著体现。该原理指出，**事物的质变并非突发的，而是由量变的积累达到临界点后发生的飞跃**。在 DINO 中，这一过程主要体现在以下几个方面： - **训练数据规模的扩展促成能力涌现** - 传统的有监督学习依赖人工标注的数据集，规模通常在百万级别，而 DINO 通过自监督学习，使训练数据扩展到了无标注的海量网络图片，规模达到数亿。 - 这一量的积累，使模型在没有人工监督的情况下，也能自发学会区分物体类别，最终形成具有良好泛化能力的特征表示。 - 特征学习的逐步优化推动了语义信息涌现 - 训练初期，DINO 学到的特征较为混乱，仅能捕捉低级视觉模式，如边缘和颜色分布。 - 随着训练的持续，网络的特征表达能力逐步提升，最终在 self-attention 层中涌现出**显式的语义分割能力**，尽管分割能力并非最初的训练目标。 - 教师-学生架构中不断积累的信息对齐，并通过自蒸馏的方式不断优化自身，局部的优化（量变）最终促成了全局的能力提升（质变）。

Posted 2025-03-31Updated 2025-08-15Note2 minutes read (About 236 words)

DINOv2 Repository Application

Repository:

Installation

official:

1	conda env create -f conda.yaml

不建议使用official的conda.yaml, 使用更改后的conda_cyl.yaml。

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0
conda install cudatoolkit
conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/dinov2/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/dinov2/include/"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/dinov2/"

export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# check with `which g++`

conda env update -f conda_cyl.yaml

pip3 install -U xformers==0.0.18

conda env config vars set PYTHONPATH="/home/cyl/Reconst/dinov2/"

Demo 🐱

官方提供了 depth estimation 和 segmentation 的 notebook，可以找时间理解一下

Train

使用的数据集为Imagenet-mini

imagenet-mini
├── labels.txt
├── train
└── val

Note: 需要额外添加一个label.txt

使用脚本生产数据集的meta data:

Posted 2025-03-25Updated 2025-08-15Note5 minutes read (About 724 words)

(Roadmap) Deeper Scene Graph For Robots

针对的问题（任务场景）

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment.
例如针对家用场景：

OVMM Challenge: https://aihabitat.org/challenge/2023_homerobot_ovmm/
想要在这样复杂场景中执行 general, long-horizon, embodied tasks 需要生成一系列离散的动作，这些动作在都拥有累计和传播错误的可能。因此需要创建一个可行的计划并在该计划出现问题时恢复，需要对物理环境进行有效的抽象以及能够完全利用该抽象的planner。应对这些挑战需要整合自然语言理解，多粒度的场景抽象和理解以及有弹性的推理。

目前粗粒度（object-level）的场景抽象（场景图构建）已经有许多工作了，详见Reconstruct-Anything Literature Review，在这些工作中，重点都在于object detection和 object-level visual relationship detection

需要聚焦的部分是多粒度的场景抽象
需要多粒度的原因：

Scalability: 如果只有一个粒度，那么输入LLM的场景图token不可控，影响扩展性
想要和物体进行更复杂的交互（相较于抓取），需要明确物体各个part的位置，语义性质，和父物体的parent-child relationship。这就要求场景图的生成需要考虑更细粒度。
针对不同复杂度的物体，需要的物体粒度层级不同
对于不同任务，需要的物体粒度也不同。
具体案例（任务需要的颗粒度层次）：
<Task>给水壶加水：
- <object-level>水壶
  - <part-level>壶盖
  - <part-level>把手
- <object-level>饮水机
  - <part-level>操作面板
    - <part-level>绿色按钮（常温水）
    - <part-level>红色按钮（开水）
    - <part-level>童锁
  - <part-level>水槽
- <object-level>桌子
  - <part-level>桌面
<Task>离开房间
- <object-level>门
  - <part-level>把手
  - <part-level>纸条：“离开房间前把玩偶放回红筐”
- <object-level>黄鸭玩偶
- <object-level>红框

在更细粒度（part-level）的场景抽象中，重点在于子物体和父物体关系的识别

除此，和object-level scene graph中的object detection相对的，是part-level scene graph的子物体语义的多粒度分割和语义信息提取，可以由现有的Semantic-SAM和类似CLIP或者其他多模态模型的语义特征提取器实现。

主要的研究流程

明确研究对象Parent-child Relationship

What aspects does parent-child relationship include?

语义构成关系，即这个子物体的存在与否给父物体的语义带来了什么改变 Translation in embedding space.
kinematic relations，也就是需要把一个物体以一个运动学树的形式构建出来

项目流程的流程

自监督的特征提取方法

Posted 2025-03-25Updated 2025-08-15Notea few seconds read (About 0 words)

ConceptAgent= LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

Posted 2025-03-24Updated 2025-08-15Note4 minutes read (About 539 words)

Semantic-SAM Repository Application

My repository: https://github.com/Chen-Yulin/Semantic-SAM
My venv: ssam

Installation

测试过的python版本：3.8,3.10
官方步骤：

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt

export DATASET=/pth/to/dataset  # path to your coco data

一些绊脚石 ^ ^

1

根据[[Cuda+Torch]]，需要先安装cudatoolkit和cuda-toolkit

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 
conda install cudatoolkit # no need to specify version
conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/<name>/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/<name>/include/" # `/usr/include`for missing `crypt.h`
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/<name>/"

然后按照torch官网的安装指令：

1	conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

2

第二行直接运行可能会报错，提示系统gcc版本过高，安装gcc=11.2.0

conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

# 指定编译器路径
export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# 找不到crypt.h的情况
sudo pacman -S libxcrypt-compat

export CXXFLAGS="${CXXFLAGS} -fuse-ld=/usr/bin/ld"

如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status 报错，只是因为没有安装cudatoolkit ^ ^

3

安装完成后直接import semantic_sam会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention' ^ ^
提示：

1
2
3

Please compile MultiScaleDeformableAttention CUDA op with the following commands:
	`cd mask2former[/modeling/pixel_decoder/ops](http://127.0.0.1:8888/modeling/pixel_decoder/ops)`
	`sh make.sh`

需要手动make一下 Mask2Former: