Posted 2025-07-18Updated 2025-10-16Note2 minutes read (About 281 words)

For the figure, the each step corresponds to a processed batch(16).

15k dataset

Resdcn + FPNx2 (↓:32,16,8,4)

18 layer

with warm-up and cosine schedule

python train.py --log_name resdcn18_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch resdcn_18 \
                --lr 1e-5 \
                --batch_size 36 \
                --num_epochs 90 --num_workers 0 --log_interval 5

### 50 layer #### with warm-up and cosine schedule

python train.py --log_name resdcn50_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch resdcn_50 \
                --lr 1e-5 \
                --batch_size 24 \
                --num_epochs 90 --num_workers 0 --log_interval 5

### 101 layer #### Use PSR dataset before remove bg samples (12415 training samples)

python train.py --log_name resdcn_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch resdcn_101 \
                --lr 5e-5 \
                --lr_step 40,70 \
                --batch_size 16 \
                --num_epochs 90 --num_workers 0 --log_interval 5

#### Use PSR dataset after remove bg samples (11729 samples)

python train.py --log_name resdcn_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch resdcn_101 \
                --lr 5e-5 \
                --lr_step 40,70 \
                --batch_size 16 \
                --num_epochs 90 --num_workers 0 --log_interval 5

HRNet (↓:4,4,4,4)

Use PSR dataset after remove bg samples (11729 samples)

python train.py --log_name hrnet_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch hrnet \
                --lr 5e-5 \
                --lr_step 40,70 \
                --batch_size 16 \
                --num_epochs 90 --num_workers 0 --log_interval 5

## Hourglass (↓:4,4,4,4) ### Use PSR dataset after remove bg samples (11729 samples)

python train.py --log_name hg_all \
                --data_dir ~/cyl/Data/PSR_final \
                --arch hourglass_small \
                --lr 5e-5 \
                --lr_step 40,70 \
                --batch_size 16 \
                --num_epochs 90 --num_workers 0 --log_interval 5

25k dataset

Resdcn+FPNx2 (↓:32,16,8,4)

50 layer

Resnet (pretrained) (↓:32,16,8,4)

Posted 2025-05-23Updated 2025-10-16Note3 minutes read (About 435 words)

Pixtral 12B API Inference

Repository:
https://github.com/PSGBOT/pixtral-12B-Inference

本地图片上传

def encode_image(image_path):
    """Encode the image to base64."""
    try:
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    except FileNotFoundError:
        print(f"Error: The file {image_path} was not found.")
        return None
    except Exception as e:  # Added general exception handling
        print(f"Error: {e}")
        return None

Prompt

VLM物体描述的prompt:

核心需要：准确定位物体所在方位，不把远景识别为物体，降低False Positive

Focus on the area highlighted in green in the image.

Step 1: Determine if the highlighted area represents a distinct, identifiable object or instance:
- If the highlighted area is clearly a distinct object, proceed to Step 2.
- If the highlighted area is abstract, ambiguous, or you cannot confidently identify it as a specific object (e.g., part of background, texture, partial view), respond with "Valid: No".

Step 2: If the highlighted area is a distinct object, provide:
1. The specific name of the object (be precise and use technical terms when appropriate)
2. The primary function or purpose of this object
3. Any notable features visible in the highlighted area (no color description)
4. If there is text visible on the object, include what it says

Remember, if you're uncertain about the highlighted area being a distinct object, respond only with "Valid: No".

输出结果：

Valid

Valid: Yes

1. The specific name of the object: Soap dispenser
2. The primary function or purpose of this object: To dispense liquid soap or hand sanitizer.
3. Notable features visible in the highlighted area:
	- The dispenser has a pump mechanism at the top.
	- The body of the dispenser is cylindrical.
	- The material appears to be translucent plastic.
4. There is no visible text on the object.

invalid
1
Valid: No

VLM输出->Structured Output

使用另一个LLM来对VLM输出的内容进行parse，转化成json文件, 通过mistral ai 提供的接口实现:

class Instance(BaseModel):
    valid: str
    name: Optional[str] = None
    feature: Optional[List[str]] = Field(default_factory=list)
    usage: Optional[List[str]] = Field(default_factory=list)

def parse_description_msg(msg):
    message = [
        {"role": "system", "content": "Extract the description information."},
        {
            "role": "user",
            "content": msg,
        },
    ]
    return message

chat_response = self.client.chat.parse(
	model=self.llm,
	messages=msg,
	response_format=Instance,
	max_tokens=self.llm_max_tokens,
	temperature=self.llm_temperature,
)
return json.loads(chat_response.choices[0].message.content)

Posted 2025-04-23Updated 2025-10-16Note6 minutes read (About 870 words)

FCSGG Repo Explanation

FCSGG Repository Summary

FCSGG (Fully Convolutional Scene Graph Generation) is a PyTorch implementation of the paper “Fully Convolutional Scene Graph Generation” published in CVPR 2021. The project focuses on scene graph generation, which is the task of detecting objects in an image and identifying the relationships between them.

Core Components:

Architecture:
- Built on Detectron2, a popular object detection framework by Facebook
- Uses a one-stage detector approach (CenterNet) as the meta-architecture
- Supports various backbones including ResNet, HRNet (High-Resolution Network), Hourglass networks, and DLA
Key Features:
- Fully convolutional approach to scene graph generation
- Multiple backbone options with different feature pyramid networks (FPN, BiFPN, HRFPN)
- Various head designs including multiscale heads and attention mechanisms
- Support for different input resolutions and training strategies
Dataset:
- Primarily designed for the Visual Genome dataset, a large-scale dataset for scene understanding
- Includes custom data loaders and preprocessing for scene graph generation
Model Components:
- Backbones: Various CNN architectures (ResNet, HRNet, Hourglass, DLA)
- Necks: Feature pyramid networks and variants (FPN, BiFPN, HRFPN, Trident)
- Heads: Detection and relationship prediction heads
- Loss Functions: Custom losses for object detection and relationship prediction
Utilities:
- Visualization tools for scene graphs
- Evaluation metrics for scene graph generation
- Training and inference scripts

Project Structure:

fcsgg/: Main module containing model implementation
- modeling/: Neural network architecture components
  - backbone/: Feature extraction networks
  - necks/: Feature pyramid networks
  - heads/: Detection and relationship prediction heads
  - meta_arch/: High-level model architecture (CenterNet)
- data/: Dataset handling and preprocessing
- evaluation/: Metrics and evaluation code
- utils/: Helper functions and utilities
- layers/: Custom neural network layers
- structures/: Data structures for scene graphs
configs/: Configuration files for different model variants
tools/: Training, evaluation, and visualization scripts
GraphViz/: Visualization tools for scene graphs

Key Innovations:

The project implements a fully convolutional approach to scene graph generation, which differs from traditional two-stage methods. Instead of first detecting objects and then predicting relationships, it uses a one-stage detector to simultaneously predict objects and their relationships in a fully convolutional manner.

Benchmarks:

The repository provides several pre-trained models with different backbones:

HRNetW32-1S
ResNet50-4S-FPN×2
HRNetW48-5S-FPN×2

These models achieve competitive performance on the Visual Genome dataset for scene graph generation tasks.

Usage:

The project provides tools for training, evaluation, and visualization of scene graphs. It requires the Visual Genome dataset and can be run using Docker or directly with PyTorch.

In summary, FCSGG is a comprehensive implementation of a state-of-the-art approach to scene graph generation using fully convolutional networks, offering various model architectures and training configurations.

How Detectron2 is Used in FCSGG

FCSGG is built on top of Detectron2, Facebook’s object detection framework, and leverages many of its components while extending it for scene graph generation. Here’s a detailed breakdown:

1. Core Architecture Integration

Meta Architecture: FCSGG registers a custom meta architecture called “CenterNet” with Detectron2’s META_ARCH_REGISTRY. This extends Detectron2’s modular architecture system while maintaining compatibility.
Backbone Networks: FCSGG uses Detectron2’s backbone networks (ResNet, etc.) directly and also implements custom backbones like HRNet while following Detectron2’s backbone interface.
Feature Pyramid Networks (FPN): The repository uses Detectron2’s FPN implementation and extends it with custom variants like BiFPN and HRFPN.

2. Configuration System

YAML Configuration: FCSGG adopts Detectron2’s YAML-based configuration system, extending it with custom configurations for scene graph generation through add_fcsgg_config().
Command Line Arguments: The training script uses Detectron2’s default_argument_parser() to maintain the same command-line interface.

3. Data Handling

Dataset Registration: Visual Genome dataset is registered with Detectron2’s DatasetCatalog and MetadataCatalog, making it available through Detectron2’s data loading pipeline.
Custom Dataset Mapper: FCSGG implements a custom DatasetMapper class that extends Detectron2’s mapper to handle scene graph annotations.
Data Loaders: The repository uses Detectron2’s build_detection_train_loader and build_detection_test_loader with custom mappers.

4. Training and Evaluation

Trainer Class: FCSGG extends Detectron2’s DefaultTrainer class to customize the training loop, evaluation metrics, and data loading.
Checkpointing: The repository uses Detectron2’s DetectionCheckpointer for model saving and loading.
Distributed Training: FCSGG leverages Detectron2’s distributed training utilities through detectron2.utils.comm and the launch function.
Custom Evaluators: The repository implements a custom VGEvaluator for scene graph evaluation while following Detectron2’s evaluator interface.

5. Visualization and Logging

Event Storage: FCSGG uses Detectron2’s event storage system for logging metrics during training.
Visualization Tools: The repository leverages Detectron2’s visualization utilities for debugging and result analysis.

6. Extensions for Scene Graph Generation

Custom Heads: While using Detectron2’s architecture, FCSGG implements custom prediction heads for relationship detection.
Scene Graph Structures: The repository defines custom data structures for scene graphs that integrate with Detectron2’s Instances class.
Loss Functions: FCSGG implements specialized loss functions for scene graph generation while maintaining compatibility with Detectron2’s loss computation framework.

7. Installation and Dependencies

Submodule Integration: Detectron2 is included as a Git submodule, ensuring version compatibility.
Build Process: The installation process includes building Detectron2 from source to ensure proper integration.

In summary, FCSGG uses Detectron2 as its foundation, leveraging its modular architecture, data handling, training infrastructure, and configuration system while extending it with custom components for scene graph generation. This approach allows FCSGG to benefit from Detectron2’s robust implementation and optimizations while adding specialized functionality for relationship detection between objects.

Posted 2025-04-22Updated 2025-10-16Notea few seconds read (About 0 words)

Detectron

Posted 2025-04-22Updated 2025-10-16Note4 minutes read (About 570 words)

FCSGG Repository Application

Official repo:
https://github.com/liuhengyue/fcsgg
Our repo:
https://github.com/PSGBOT/KAF-Generation

My venv: fcsgg

Installation

Environment Preparation

git clone git@github.com:liuhengyue/fcsgg.git
cd fcsgg
git submodule init
git submodule update
conda create --name fcsgg
conda create -n fcsgg python=3.10
conda install nvidia/label/cuda-11.8.0::cuda-toolkit -c nvidia/label/cuda-11.8.0
conda install cudatoolkit
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# for building detectron
conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/fcsgg/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/fcsgg/include/"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/fcsgg/"

conda deactivate
conda activate fcsgg

export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

pip install -r requirements.txt
python -m pip install -e detectron2

Downloads

Datasets:

cd ~/Reconst
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -P ./Data/vg/
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -P ./Data/vg/
unzip -j ./Data/vg/images.zip -d ./Data/vg/VG_100K
unzip -j ./Data/vg/images2.zip -d ./Data/vg/VG_100K

Download the scene graphs and extract them to datasets/vg/VG-SGG-with-attri.h5.

Issues

1

1	AttributeError: module 'PIL.Image' has no attribute 'LINEAR'. Did you mean: 'BILINEAR'?

LINEAR-> BILINEAR: commit

2

在尝试训练的过程中报错：

1
2
3

  File "/home/cyl/Reconst/fcsgg/fcsgg/data/detection_utils.py", line 432, in generate_score_map
    masked_fmap = torch.max(masked_fmap, gaussian_mask * k)
RuntimeError: The size of tensor a (55) must match the size of tensor b (56) at non-singleton dimension 1

modify detection_utils.py: commit

Training

首先更改训练的配置文件./config/quick_schedules/Quick-FCSGG-HRNet-W32.yaml, (原文件使用预训练的参数)

MODEL:
  META_ARCHITECTURE: "CenterNet"
  HRNET:
    WEIGHTS: "output/FasterR-CNN-HR32-3x.pth"

更改为train from scratch

MODEL:
  META_ARCHITECTURE: "CenterNet"
  HRNET:
    WEIGHTS: ""  # Empty string to train from scratch

再运行：

1	python tools/train_net.py --num-gpus 1 --config-file configs/quick_schedules/Quick-FCSGG-HRNet-W32.yaml

成功训练✌

...
[04/23 10:21:01] d2.utils.events INFO:  eta: 0:05:37  iter: 1159  total_loss: 1.042  loss_cls: 0.7593  loss_box_wh: 0.08827  loss_center_reg: 0.02485  loss_raf: 0.1754    time: 0.4042  last_time: 0.4519  data_time: 0.0043  last_data_time: 0.0044   lr: 0.001  max_mem: 4141M
[04/23 10:21:09] d2.utils.events INFO:  eta: 0:05:29  iter: 1179  total_loss: 1.028  loss_cls: 0.7208  loss_box_wh: 0.09246  loss_center_reg: 0.02669  loss_raf: 0.1625    time: 0.4042  last_time: 0.4035  data_time: 0.0041  last_data_time: 0.0044   lr: 0.001  max_mem: 4141M
[04/23 10:21:17] d2.utils.events INFO:  eta: 0:05:21  iter: 1199  total_loss: 1.01  loss_cls: 0.671  loss_box_wh: 0.1038  loss_center_reg: 0.02432  loss_raf: 0.1635    time: 0.4042  last_time: 0.3943  data_time: 0.0042  last_data_time: 0.0043   lr: 0.001  max_mem: 4141M
[04/23 10:21:25] d2.utils.events INFO:  eta: 0:05:13  iter: 1219  total_loss: 0.9737  loss_cls: 0.6887  loss_box_wh: 0.0929  loss_center_reg: 0.02574  loss_raf: 0.1749    time: 0.4041  last_time: 0.4101  data_time: 0.0041  last_data_time: 0.0042   lr: 0.001  max_mem: 4141M
...

Explanation

See [[FCSGG Repo Explanation]]

Posted 2025-04-11Updated 2025-10-16Notea minute read (About 166 words)

OmegaConf Python Package

docs

Start

1	pip install omegaconf

1	from omegaconf import OmegaConf

Create

You can create OmegaConf objects from multiple sources.

From `dict`

conf = OmegaConf.create({"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]})
print(OmegaConf.to_yaml(conf))

k: v
list:
- 1
- a: '1'
  b: '2'
  3: c

From `list`

conf = OmegaConf.create([1, {"a":10, "b": {"a":10, 123: "int_key"}}])
print(OmegaConf.to_yaml(conf))

- 1
- a: 10
  b:
    a: 10
    123: int_key

From `yaml`

conf = OmegaConf.load('source/example.yaml')
# Output is identical to the YAML file
print(OmegaConf.to_yaml(conf))

server:
  port: 80
log:
  file: ???
  rotation: 3600
users:
- user1
- user2

From `dot-list`

dot_list = ["a.aa.aaa=1", "a.aa.bbb=2", "a.bb.aaa=3", "a.bb.bbb=4"]
conf = OmegaConf.from_dotlist(dot_list)
print(OmegaConf.to_yaml(conf))

a:
  aa:
    aaa: 1
    bbb: 2
  bb:
    aaa: 3
    bbb: 4

From command line arguments

sys.argv = ['your-program.py', 'server.port=82', 'log.file=log2.txt']
conf = OmegaConf.from_cli()
print(OmegaConf.to_yaml(conf))

server:
  port: 82
log:
  file: log2.txt

Posted 2025-04-07Updated 2025-10-16Note4 minutes read (About 616 words)

SJTU-HPC基本操作

官方文档

`LMOD`软件管理

https://lmod.readthedocs.io/en/latest/010_user.html

交大hpc使用lmod来管理用户软件，每一次重新登陆都会重置module，需要将软件重新load。
常用指令：

ml # list these loaded modules
ml miniconda3 # load the miniconda3 module
ml -miniconda3 # unload the miniconda3 module
module spider # inspect the possible modules that can be loaded
module spider cuda # show available cuda versions

`oh-my-zsh`更好的shell

hpc默认已安装zsh, 通过omzsh的安装脚本进行安装，并安装插件

export CHSH=no  # 避免它自动切换默认 shell（HPC 通常不允许）
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions
git clone --depth 1 https://github.com/junegunn/fzf.git ~/.fzf
~/.fzf/install
git clone https://github.com/psprint/zsh-navigation-tools ~/.zsh-navigation-tools

## vim add configuration in `.zshrc`
plugins=(
  git
  zsh-navigation-tools
  rails
  ruby
  zsh-syntax-highlighting
  zsh-autosuggestions
)
source ${(q-)ZSH_CUSTOM:-$ZSH/custom}/plugins/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh

## :wq

`nvim` 更好的编辑器

因为在服务器上没有sudo权限所以需要手动编译

首先使用conda创建编译环境

1	conda create -n nvim-build cmake libtool gettext curl unzip ninja gcc gxx -y -c conda-forge

clone & build

git clone https://github.com/neovim/neovim.git
cd neovim
git checkout stable
make CMAKE_BUILD_TYPE=Release

Tricks

交互式计算资源

某些情况下需要进行并行编译，编译需要耗费大量资源，并不一定可以在登录节点使用：

在这种情况下使用提交作业的方式较为低效，可以申请交互式计算资源

1	srun -p cpu -n 4 --pty /bin/bash

example:

(base) chenyulin@pilogin2 ~> srun -p cpu -n 4 --pty /bin/bash
srun: job 43090290 queued and waiting for resources
srun: job 43090290 has been allocated resources
(base) chenyulin@cas242 ~> ls
condalist.txt  dinov2  log  test.slurm
(base) chenyulin@cas242 ~> cd dinov2        
(base) chenyulin@cas242 ~/dinov2> conda activate dinov2
(dinov2) chenyulin@cas242 ~/dinov2> export CC=$CONDA_PREFIX/bin/gcc                       main
export CXX=$CONDA_PREFIX/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> which g++                                             main
~/.conda/envs/dinov2/bin/g++
(dinov2) chenyulin@cas242 ~/dinov2> vim conda_cyl.yaml                                    main
(dinov2) chenyulin@cas242 ~/dinov2> conda env update -f conda_cyl.yaml                    main
Retrieving notices: ...working... done
Channels:
 - xformers
 - pytorch
 - nvidia
 - conda-forge
 - defaults
 - nvidia/label/cuda-11.7.0
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
Installing pip dependencies: |

文件传输

本地向云端传输

使用hpc studio传输
https://studio.hpc.sjtu.edu.cn/pun/sys/dashboard/files/fs//lustre/home/acct-umjbyy/chenyulin/Data

云端之间互相传输

例如拷贝conda环境

1	cp -r /lustre/home/acct-umjbyy/chenyulin/.conda /dssg/home/acct-umjbyy/chenyulin/

Posted 2025-03-31Updated 2025-10-16Note2 minutes read (About 236 words)

DINOv2 Repository Application

Repository:

Installation

official:

1	conda env create -f conda.yaml

不建议使用official的conda.yaml, 使用更改后的conda_cyl.yaml。

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0
conda install cudatoolkit
conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/dinov2/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/dinov2/include/"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/dinov2/"

export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# check with `which g++`

conda env update -f conda_cyl.yaml

pip3 install -U xformers==0.0.18

conda env config vars set PYTHONPATH="/home/cyl/Reconst/dinov2/"

Demo 🐱

官方提供了 depth estimation 和 segmentation 的 notebook，可以找时间理解一下

Train

使用的数据集为Imagenet-mini

imagenet-mini
├── labels.txt
├── train
└── val

Note: 需要额外添加一个label.txt

使用脚本生产数据集的meta data:

Posted 2025-03-24Updated 2025-10-16Note4 minutes read (About 539 words)

Semantic-SAM Repository Application

My repository: https://github.com/Chen-Yulin/Semantic-SAM
My venv: ssam

Installation

测试过的python版本：3.8,3.10
官方步骤：

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt

export DATASET=/pth/to/dataset  # path to your coco data

一些绊脚石 ^ ^

1

根据[[Cuda+Torch]]，需要先安装cudatoolkit和cuda-toolkit

conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 
conda install cudatoolkit # no need to specify version
conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/<name>/lib/"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/<name>/include/" # `/usr/include`for missing `crypt.h`
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/<name>/"

然后按照torch官网的安装指令：

1	conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

2

第二行直接运行可能会报错，提示系统gcc版本过高，安装gcc=11.2.0

conda install -c conda-forge gcc=11.2.0
conda install -c conda-forge gxx=11.2.0

# 指定编译器路径
export CC=$CONDA_PREFIX/bin/gcc
export CXX=$CONDA_PREFIX/bin/g++

# 找不到crypt.h的情况
sudo pacman -S libxcrypt-compat

export CXXFLAGS="${CXXFLAGS} -fuse-ld=/usr/bin/ld"

如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status 报错，只是因为没有安装cudatoolkit ^ ^

3

安装完成后直接import semantic_sam会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention' ^ ^
提示：

1
2
3

Please compile MultiScaleDeformableAttention CUDA op with the following commands:
	`cd mask2former[/modeling/pixel_decoder/ops](http://127.0.0.1:8888/modeling/pixel_decoder/ops)`
	`sh make.sh`

需要手动make一下 Mask2Former:

1 2	cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ sh make.sh

4

一些版本问题

1 2	pip install gradio==3.37.0 pip install matplotlib==3.7.0

Demo 🐱

Generate multi-granularity Mask on CLICK

1	python demo.py --ckpt ./weights/swinl_only_sam_many2many.pth

Comment: 效果相较于SAM更多体现了语义的一致性，而不是基于texture进行分割。

Automatically Generate Mask on Different Granularity

1	python demo_auto_generation.py --ckpt ./weights/swinl_only_sam_many2many.pth

需要解决的问题

同level下mask重合

Solved by `utils.psg_utils.mask.discard_submask` in `psg_data.segment_pipeline`

Part-seg-dataset Generation

效果

原图：

.png)

Instance identification:

Part segmentation

Posted 2025-03-08Updated 2025-10-16Notea few seconds read (About 0 words)

PyTorch Einsum

15k dataset

Resdcn + FPNx2 (↓:32,16,8,4)

18 layer

with warm-up and cosine schedule

HRNet (↓:4,4,4,4)

Use PSR dataset after remove bg samples (11729 samples)

25k dataset

Resdcn+FPNx2 (↓:32,16,8,4)

50 layer

Resnet (pretrained) (↓:32,16,8,4)

本地图片上传

Prompt

VLM物体描述的prompt:

VLM输出->Structured Output

FCSGG Repository Summary

Core Components:

Project Structure:

Key Innovations:

Benchmarks:

Usage:

How Detectron2 is Used in FCSGG

1. Core Architecture Integration

2. Configuration System

3. Data Handling

4. Training and Evaluation

5. Visualization and Logging

6. Extensions for Scene Graph Generation

7. Installation and Dependencies

Installation

Environment Preparation

Downloads

Issues

1

2

Training

Explanation

Start

Create

From dict

From list

From yaml

From dot-list

From command line arguments

LMOD软件管理

oh-my-zsh更好的shell

nvim 更好的编辑器

Tricks

交互式计算资源

文件传输

本地向云端传输

云端之间互相传输

Installation

Demo 🐱

Train

Installation

一些绊脚石 ^ ^

1

2

3

4

Demo 🐱

Generate multi-granularity Mask on CLICK

Automatically Generate Mask on Different Granularity

需要解决的问题

同level下mask重合

Part-seg-dataset Generation

效果

Archives

Recents

Tags

From `dict`

From `list`

From `yaml`

From `dot-list`

`LMOD`软件管理

`oh-my-zsh`更好的shell

`nvim` 更好的编辑器