Posted 2025-05-07Updated 2025-06-26Notea few seconds read (About 31 words)

Download pkg from https://apps.cloud.blackmagicdesign.com/davinci-resolve

git clone https://aur.archlinux.org/davinci-resolve.git
cd davinci-resolve
mv ~/Downloads/DaVinci_Resolve_19.1.4_Linux.zip ./
makepkg -si

Posted 2025-04-23Updated 2025-06-26Note6 minutes read (About 870 words)

FCSGG Repo Explanation

FCSGG Repository Summary

FCSGG (Fully Convolutional Scene Graph Generation) is a PyTorch implementation of the paper “Fully Convolutional Scene Graph Generation” published in CVPR 2021. The project focuses on scene graph generation, which is the task of detecting objects in an image and identifying the relationships between them.

Core Components:

Architecture:
- Built on Detectron2, a popular object detection framework by Facebook
- Uses a one-stage detector approach (CenterNet) as the meta-architecture
- Supports various backbones including ResNet, HRNet (High-Resolution Network), Hourglass networks, and DLA
Key Features:
- Fully convolutional approach to scene graph generation
- Multiple backbone options with different feature pyramid networks (FPN, BiFPN, HRFPN)
- Various head designs including multiscale heads and attention mechanisms
- Support for different input resolutions and training strategies
Dataset:
- Primarily designed for the Visual Genome dataset, a large-scale dataset for scene understanding
- Includes custom data loaders and preprocessing for scene graph generation
Model Components:
- Backbones: Various CNN architectures (ResNet, HRNet, Hourglass, DLA)
- Necks: Feature pyramid networks and variants (FPN, BiFPN, HRFPN, Trident)
- Heads: Detection and relationship prediction heads
- Loss Functions: Custom losses for object detection and relationship prediction
Utilities:
- Visualization tools for scene graphs
- Evaluation metrics for scene graph generation
- Training and inference scripts

Project Structure:

fcsgg/: Main module containing model implementation
- modeling/: Neural network architecture components
  - backbone/: Feature extraction networks
  - necks/: Feature pyramid networks
  - heads/: Detection and relationship prediction heads
  - meta_arch/: High-level model architecture (CenterNet)
- data/: Dataset handling and preprocessing
- evaluation/: Metrics and evaluation code
- utils/: Helper functions and utilities
- layers/: Custom neural network layers
- structures/: Data structures for scene graphs
configs/: Configuration files for different model variants
tools/: Training, evaluation, and visualization scripts
GraphViz/: Visualization tools for scene graphs

Key Innovations:

The project implements a fully convolutional approach to scene graph generation, which differs from traditional two-stage methods. Instead of first detecting objects and then predicting relationships, it uses a one-stage detector to simultaneously predict objects and their relationships in a fully convolutional manner.

Benchmarks:

The repository provides several pre-trained models with different backbones:

HRNetW32-1S
ResNet50-4S-FPN×2
HRNetW48-5S-FPN×2

These models achieve competitive performance on the Visual Genome dataset for scene graph generation tasks.

Usage:

The project provides tools for training, evaluation, and visualization of scene graphs. It requires the Visual Genome dataset and can be run using Docker or directly with PyTorch.

In summary, FCSGG is a comprehensive implementation of a state-of-the-art approach to scene graph generation using fully convolutional networks, offering various model architectures and training configurations.

How Detectron2 is Used in FCSGG

FCSGG is built on top of Detectron2, Facebook’s object detection framework, and leverages many of its components while extending it for scene graph generation. Here’s a detailed breakdown:

1. Core Architecture Integration

Meta Architecture: FCSGG registers a custom meta architecture called “CenterNet” with Detectron2’s META_ARCH_REGISTRY. This extends Detectron2’s modular architecture system while maintaining compatibility.
Backbone Networks: FCSGG uses Detectron2’s backbone networks (ResNet, etc.) directly and also implements custom backbones like HRNet while following Detectron2’s backbone interface.
Feature Pyramid Networks (FPN): The repository uses Detectron2’s FPN implementation and extends it with custom variants like BiFPN and HRFPN.

2. Configuration System

YAML Configuration: FCSGG adopts Detectron2’s YAML-based configuration system, extending it with custom configurations for scene graph generation through add_fcsgg_config().
Command Line Arguments: The training script uses Detectron2’s default_argument_parser() to maintain the same command-line interface.

3. Data Handling

Dataset Registration: Visual Genome dataset is registered with Detectron2’s DatasetCatalog and MetadataCatalog, making it available through Detectron2’s data loading pipeline.
Custom Dataset Mapper: FCSGG implements a custom DatasetMapper class that extends Detectron2’s mapper to handle scene graph annotations.
Data Loaders: The repository uses Detectron2’s build_detection_train_loader and build_detection_test_loader with custom mappers.

4. Training and Evaluation

Trainer Class: FCSGG extends Detectron2’s DefaultTrainer class to customize the training loop, evaluation metrics, and data loading.
Checkpointing: The repository uses Detectron2’s DetectionCheckpointer for model saving and loading.
Distributed Training: FCSGG leverages Detectron2’s distributed training utilities through detectron2.utils.comm and the launch function.
Custom Evaluators: The repository implements a custom VGEvaluator for scene graph evaluation while following Detectron2’s evaluator interface.

5. Visualization and Logging

Event Storage: FCSGG uses Detectron2’s event storage system for logging metrics during training.
Visualization Tools: The repository leverages Detectron2’s visualization utilities for debugging and result analysis.

6. Extensions for Scene Graph Generation

Custom Heads: While using Detectron2’s architecture, FCSGG implements custom prediction heads for relationship detection.
Scene Graph Structures: The repository defines custom data structures for scene graphs that integrate with Detectron2’s Instances class.
Loss Functions: FCSGG implements specialized loss functions for scene graph generation while maintaining compatibility with Detectron2’s loss computation framework.

7. Installation and Dependencies

Submodule Integration: Detectron2 is included as a Git submodule, ensuring version compatibility.
Build Process: The installation process includes building Detectron2 from source to ensure proper integration.

In summary, FCSGG uses Detectron2 as its foundation, leveraging its modular architecture, data handling, training infrastructure, and configuration system while extending it with custom components for scene graph generation. This approach allows FCSGG to benefit from Detectron2’s robust implementation and optimizations while adding specialized functionality for relationship detection between objects.

Posted 2025-04-22Updated 2025-06-26Notea few seconds read (About 0 words)

Detectron

Posted 2025-04-22Updated 2025-06-26Note4 minutes read (About 570 words)

FCSGG Repository Application

Official repo:
https://github.com/liuhengyue/fcsgg
Our repo:
https://github.com/PSGBOT/KAF-Generation

My venv: fcsgg

Installation

Environment Preparation

git clone git@github.com:liuhengyue/fcsgg.git
cd fcsgg
git submodule init
git submodule update
conda create --name fcsgg
conda create -n fcsgg python=3.10
conda install nvidia/label/cuda-11.8.0::cuda-toolkit -c nvidia/label/cuda-11.8.0
conda install cudatoolkit
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# for building detectron
conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/fcsgg/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/fcsgg/include/"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/fcsgg/"

conda deactivate
conda activate fcsgg

export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

pip install -r requirements.txt
python -m pip install -e detectron2

Downloads

Datasets:

cd ~/Reconst
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -P ./Data/vg/
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -P ./Data/vg/
unzip -j ./Data/vg/images.zip -d ./Data/vg/VG_100K
unzip -j ./Data/vg/images2.zip -d ./Data/vg/VG_100K

Download the scene graphs and extract them to datasets/vg/VG-SGG-with-attri.h5.

Issues

1

1	AttributeError: module 'PIL.Image' has no attribute 'LINEAR'. Did you mean: 'BILINEAR'?

LINEAR-> BILINEAR: commit

2

在尝试训练的过程中报错：

1
2
3

  File "/home/cyl/Reconst/fcsgg/fcsgg/data/detection_utils.py", line 432, in generate_score_map
    masked_fmap = torch.max(masked_fmap, gaussian_mask * k)
RuntimeError: The size of tensor a (55) must match the size of tensor b (56) at non-singleton dimension 1

modify detection_utils.py: commit

Training

首先更改训练的配置文件./config/quick_schedules/Quick-FCSGG-HRNet-W32.yaml, (原文件使用预训练的参数)

MODEL:
  META_ARCHITECTURE: "CenterNet"
  HRNET:
    WEIGHTS: "output/FasterR-CNN-HR32-3x.pth"

更改为train from scratch

MODEL:
  META_ARCHITECTURE: "CenterNet"
  HRNET:
    WEIGHTS: ""  # Empty string to train from scratch

再运行：

1	python tools/train_net.py --num-gpus 1 --config-file configs/quick_schedules/Quick-FCSGG-HRNet-W32.yaml

成功训练✌

...
[04/23 10:21:01] d2.utils.events INFO:  eta: 0:05:37  iter: 1159  total_loss: 1.042  loss_cls: 0.7593  loss_box_wh: 0.08827  loss_center_reg: 0.02485  loss_raf: 0.1754    time: 0.4042  last_time: 0.4519  data_time: 0.0043  last_data_time: 0.0044   lr: 0.001  max_mem: 4141M
[04/23 10:21:09] d2.utils.events INFO:  eta: 0:05:29  iter: 1179  total_loss: 1.028  loss_cls: 0.7208  loss_box_wh: 0.09246  loss_center_reg: 0.02669  loss_raf: 0.1625    time: 0.4042  last_time: 0.4035  data_time: 0.0041  last_data_time: 0.0044   lr: 0.001  max_mem: 4141M
[04/23 10:21:17] d2.utils.events INFO:  eta: 0:05:21  iter: 1199  total_loss: 1.01  loss_cls: 0.671  loss_box_wh: 0.1038  loss_center_reg: 0.02432  loss_raf: 0.1635    time: 0.4042  last_time: 0.3943  data_time: 0.0042  last_data_time: 0.0043   lr: 0.001  max_mem: 4141M
[04/23 10:21:25] d2.utils.events INFO:  eta: 0:05:13  iter: 1219  total_loss: 0.9737  loss_cls: 0.6887  loss_box_wh: 0.0929  loss_center_reg: 0.02574  loss_raf: 0.1749    time: 0.4041  last_time: 0.4101  data_time: 0.0041  last_data_time: 0.0042   lr: 0.001  max_mem: 4141M
...

Explanation

See [[FCSGG Repo Explanation]]

Posted 2025-04-16Updated 2025-06-26Notea few seconds read (About 3 words)

Vision-Language Interpreter for Robot Task Planning

Posted 2025-04-14Updated 2025-06-26Notea few seconds read (About 95 words)

(Mindmap) Part-level Scene Understanding for Robots

概念梳理

Scene Graph

A scene graph is a structural representation, which can capture detailed semantics by explicitly Modeling:

objects (‘‘man’’, ‘‘fire hydrant’’, ‘‘shorts’’)
attributes of objects (‘‘fire hydrant is yellow’’)
relations between paired objects (‘‘man jumping over fire hydrant’’)

A scene graph is a set of visual relationship triplets in the form of <subject, relation, object> or <object, is, attribute>

Scene graphs should serve as an **objective semantic representation** of the state of the scene

Posted 2025-04-11Updated 2025-06-26Notea minute read (About 166 words)

OmegaConf Python Package

docs

Start

1	pip install omegaconf

1	from omegaconf import OmegaConf

Create

You can create OmegaConf objects from multiple sources.

From `dict`

conf = OmegaConf.create({"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]})
print(OmegaConf.to_yaml(conf))

k: v
list:
- 1
- a: '1'
  b: '2'
  3: c

From `list`

conf = OmegaConf.create([1, {"a":10, "b": {"a":10, 123: "int_key"}}])
print(OmegaConf.to_yaml(conf))

- 1
- a: 10
  b:
    a: 10
    123: int_key

From `yaml`

conf = OmegaConf.load('source/example.yaml')
# Output is identical to the YAML file
print(OmegaConf.to_yaml(conf))

server:
  port: 80
log:
  file: ???
  rotation: 3600
users:
- user1
- user2

From `dot-list`

dot_list = ["a.aa.aaa=1", "a.aa.bbb=2", "a.bb.aaa=3", "a.bb.bbb=4"]
conf = OmegaConf.from_dotlist(dot_list)
print(OmegaConf.to_yaml(conf))

a:
  aa:
    aaa: 1
    bbb: 2
  bb:
    aaa: 3
    bbb: 4

From command line arguments

sys.argv = ['your-program.py', 'server.port=82', 'log.file=log2.txt']
conf = OmegaConf.from_cli()
print(OmegaConf.to_yaml(conf))

server:
  port: 82
log:
  file: log2.txt

Posted 2025-04-07Updated 2025-06-26Note4 minutes read (About 616 words)

SJTU-HPC基本操作

官方文档

`LMOD`软件管理

https://lmod.readthedocs.io/en/latest/010_user.html

交大hpc使用lmod来管理用户软件，每一次重新登陆都会重置module，需要将软件重新load。
常用指令：

ml # list these loaded modules
ml miniconda3 # load the miniconda3 module
ml -miniconda3 # unload the miniconda3 module
module spider # inspect the possible modules that can be loaded
module spider cuda # show available cuda versions

`oh-my-zsh`更好的shell

hpc默认已安装zsh, 通过omzsh的安装脚本进行安装，并安装插件

export CHSH=no  # 避免它自动切换默认 shell（HPC 通常不允许）
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions
git clone --depth 1 https://github.com/junegunn/fzf.git ~/.fzf
~/.fzf/install
git clone https://github.com/psprint/zsh-navigation-tools ~/.zsh-navigation-tools

## vim add configuration in `.zshrc`
plugins=(
  git
  zsh-navigation-tools
  rails
  ruby
  zsh-syntax-highlighting
  zsh-autosuggestions
)
source ${(q-)ZSH_CUSTOM:-$ZSH/custom}/plugins/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh

## :wq

`nvim` 更好的编辑器

因为在服务器上没有sudo权限所以需要手动编译

首先使用conda创建编译环境

1	conda create -n nvim-build cmake libtool gettext curl unzip ninja gcc gxx -y -c conda-forge

clone & build

git clone https://github.com/neovim/neovim.git
cd neovim
git checkout stable
make CMAKE_BUILD_TYPE=Release

Tricks

交互式计算资源

某些情况下需要进行并行编译，编译需要耗费大量资源，并不一定可以在登录节点使用：

在这种情况下使用提交作业的方式较为低效，可以申请交互式计算资源

1	srun -p cpu -n 4 --pty /bin/bash

example:

(base) chenyulin@pilogin2 ~> srun -p cpu -n 4 --pty /bin/bash
srun: job 43090290 queued and waiting for resources
srun: job 43090290 has been allocated resources
(base) chenyulin@cas242 ~> ls
condalist.txt  dinov2  log  test.slurm
(base) chenyulin@cas242 ~> cd dinov2        
(base) chenyulin@cas242 ~/dinov2> conda activate dinov2
(dinov2) chenyulin@cas242 ~/dinov2> export CC=$CONDA_PREFIX/bin/gcc                       main
export CXX=$CONDA_PREFIX/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> which g++                                             main
~/.conda/envs/dinov2/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> vim conda_cyl.yaml                                    main
(dinov2) chenyulin@cas242 ~/dinov2> conda env update -f conda_cyl.yaml                    main
Retrieving notices: ...working... done
Channels:
 - xformers
 - pytorch
 - nvidia
 - conda-forge
 - defaults
 - nvidia/label/cuda-11.7.0
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
Installing pip dependencies: |

文件传输

本地向云端传输

使用hpc studio传输
https://studio.hpc.sjtu.edu.cn/pun/sys/dashboard/files/fs//lustre/home/acct-umjbyy/chenyulin/Data

云端之间互相传输

例如拷贝conda环境

1	cp -r /lustre/home/acct-umjbyy/chenyulin/.conda /dssg/home/acct-umjbyy/chenyulin/

Posted 2025-04-02Updated 2025-06-26Notea few seconds read (About 16 words)

Tuxguitar on Archlinux

Installation

1
2
3

yay -S tuxguitar
yay -S jack2
yay -S libffado

No sound, need some fix

Posted 2025-04-01Updated 2025-06-26Note9 minutes read (About 1383 words)

自然辩证法分享之AI中的涌现

涌现的定义

西哲中的涌现

“涌现”（Emergence）最早起源于哲学领域。19世纪，英国哲学家乔治·亨利·刘易斯次用这个词来描述那些无法通过组成部分的性质解释的整体特性，即“整体大于部分之和”的现象。

将一切都归结为简单基本规律的能力并不意味着有能力从这些规律出发，重建宇宙。在面对规模和复杂度的双重困难时，还原主义假说便崩溃了。在每个复杂度层次上，都会出现全新的性质。心理学不是应用生物学，生物学也不是应用化学。我们现在可以看到，整体不仅变得更多，而且与部分的总和也非常不同。

复杂系统（计算机）中的涌现

系统科学家穆雷·盖尔曼和斯图尔特·考夫曼通过对复杂系统中自组织行为的研究，进一步发展了“涌现”的概念。他们认为，“涌现”指的是在复杂系统内，简单组件通过相互作用自然产生的某些特性或现象。这意味着，整体所展现的特征或行为不能简单地从其组成部分的性质中推导出来，而是在这些部分的相互作用中自发产生的。涌现现象侧重于解释复杂系统如何在没有外部指令或中央控制的情况下，通过系统内部的简单规则和相互作用，形成新的有序结构和行为。这表明，复杂性可以自然地产生，而不需要外部干预或预先的详细设计。

在这一方面最具代表性的就是生命游戏：
在生命游戏中：

规则极其简单（细胞的生死规则）。
初始状态可能看似随机。
但随着演化，整个系统会涌现出复杂的模式，如振荡器、滑翔机等。
![[game-of-life-loop-cropped.gif]]

AI中的涌现

关于文本自回归模型chatgpt参数提升带来的语言理解、推理、编程、创意生成等新能力的涌现我这里就不过多赘述了，我主要侧重分享一下在自监督视觉领域的涌现现象。

DINO（Self-Distillation with No Labels）是一种自监督学习方法，用于无标签学习视觉特征。它基于知识蒸馏（Distillation）和对比学习的思想，使用两组权重共享的网络（教师网络和学生网络）进行训练。

这个模型的思想是使用学生模型来预测教师模型生成的label，而教师模型的权重则使用学生模型的权重更新。其中学生们模型输入的图像是原始照片中的局部小区域的裁剪，而教师模型的输入是更大更全局的裁剪。这种自监督的方式将数据集从原先有标注的数据集的百万规模扩展到了无标注网络图片的数亿规模。在经过训练之后，这个模型能够实现那些起初并没有详细地针对性设计的功能 - 可以在没有做任何微调或者添加映射层的基础上在nearest neighbors classifier (k-NN)达到top-1准确率。 - 其次，在模型的最后一层self-attention layer上可以观察到显著的语义分割信息

## 体现的自然辩证法规律“量变产生质变” 在 DINO 这一自监督学习方法中，**“量变产生质变”**的自然辩证法规律得到了显著体现。该原理指出，**事物的质变并非突发的，而是由量变的积累达到临界点后发生的飞跃**。在 DINO 中，这一过程主要体现在以下几个方面： - **训练数据规模的扩展促成能力涌现** - 传统的有监督学习依赖人工标注的数据集，规模通常在百万级别，而 DINO 通过自监督学习，使训练数据扩展到了无标注的海量网络图片，规模达到数亿。 - 这一量的积累，使模型在没有人工监督的情况下，也能自发学会区分物体类别，最终形成具有良好泛化能力的特征表示。 - 特征学习的逐步优化推动了语义信息涌现 - 训练初期，DINO 学到的特征较为混乱，仅能捕捉低级视觉模式，如边缘和颜色分布。 - 随着训练的持续，网络的特征表达能力逐步提升，最终在 self-attention 层中涌现出**显式的语义分割能力**，尽管分割能力并非最初的训练目标。 - 教师-学生架构中不断积累的信息对齐，并通过自蒸馏的方式不断优化自身，局部的优化（量变）最终促成了全局的能力提升（质变）。

FCSGG Repository Summary

Core Components:

Project Structure:

Key Innovations:

Benchmarks:

Usage:

How Detectron2 is Used in FCSGG

1. Core Architecture Integration

2. Configuration System

3. Data Handling

4. Training and Evaluation

5. Visualization and Logging

6. Extensions for Scene Graph Generation

7. Installation and Dependencies

Installation

Environment Preparation

Downloads

Issues

1

2

Training

Explanation

概念梳理

Scene Graph

Start

Create

From dict

From list

From yaml

From dot-list

From command line arguments

LMOD软件管理

oh-my-zsh更好的shell

nvim 更好的编辑器

Tricks

交互式计算资源

文件传输

本地向云端传输

云端之间互相传输

Installation

涌现的定义

西哲中的涌现

复杂系统（计算机）中的涌现

AI中的涌现

Archives

Recents

Tags

From `dict`

From `list`

From `yaml`

From `dot-list`

`LMOD`软件管理

`oh-my-zsh`更好的shell

`nvim` 更好的编辑器