
Repository:
https://github.com/PSGBOT/pixtral-12B-Inference
本地图片上传
1 | def encode_image(image_path): |
Repository:
https://github.com/PSGBOT/pixtral-12B-Inference
1 | def encode_image(image_path): |
FCSGG (Fully Convolutional Scene Graph Generation) is a PyTorch implementation of the paper “Fully Convolutional Scene Graph Generation” published in CVPR 2021. The project focuses on scene graph generation, which is the task of detecting objects in an image and identifying the relationships between them.
Architecture:
Key Features:
Dataset:
Model Components:
Utilities:
fcsgg/: Main module containing model implementation
configs/: Configuration files for different model variants
tools/: Training, evaluation, and visualization scripts
GraphViz/: Visualization tools for scene graphs
The project implements a fully convolutional approach to scene graph generation, which differs from traditional two-stage methods. Instead of first detecting objects and then predicting relationships, it uses a one-stage detector to simultaneously predict objects and their relationships in a fully convolutional manner.
The repository provides several pre-trained models with different backbones:
These models achieve competitive performance on the Visual Genome dataset for scene graph generation tasks.
The project provides tools for training, evaluation, and visualization of scene graphs. It requires the Visual Genome dataset and can be run using Docker or directly with PyTorch.
In summary, FCSGG is a comprehensive implementation of a state-of-the-art approach to scene graph generation using fully convolutional networks, offering various model architectures and training configurations.
FCSGG is built on top of Detectron2, Facebook’s object detection framework, and leverages many of its components while extending it for scene graph generation. Here’s a detailed breakdown:
Meta Architecture: FCSGG registers a custom meta architecture called “CenterNet” with Detectron2’s META_ARCH_REGISTRY
. This extends Detectron2’s modular architecture system while maintaining compatibility.
Backbone Networks: FCSGG uses Detectron2’s backbone networks (ResNet, etc.) directly and also implements custom backbones like HRNet while following Detectron2’s backbone interface.
Feature Pyramid Networks (FPN): The repository uses Detectron2’s FPN implementation and extends it with custom variants like BiFPN and HRFPN.
YAML Configuration: FCSGG adopts Detectron2’s YAML-based configuration system, extending it with custom configurations for scene graph generation through add_fcsgg_config()
.
Command Line Arguments: The training script uses Detectron2’s default_argument_parser()
to maintain the same command-line interface.
Dataset Registration: Visual Genome dataset is registered with Detectron2’s DatasetCatalog
and MetadataCatalog
, making it available through Detectron2’s data loading pipeline.
Custom Dataset Mapper: FCSGG implements a custom DatasetMapper
class that extends Detectron2’s mapper to handle scene graph annotations.
Data Loaders: The repository uses Detectron2’s build_detection_train_loader
and build_detection_test_loader
with custom mappers.
Trainer Class: FCSGG extends Detectron2’s DefaultTrainer
class to customize the training loop, evaluation metrics, and data loading.
Checkpointing: The repository uses Detectron2’s DetectionCheckpointer
for model saving and loading.
Distributed Training: FCSGG leverages Detectron2’s distributed training utilities through detectron2.utils.comm
and the launch
function.
Custom Evaluators: The repository implements a custom VGEvaluator
for scene graph evaluation while following Detectron2’s evaluator interface.
Event Storage: FCSGG uses Detectron2’s event storage system for logging metrics during training.
Visualization Tools: The repository leverages Detectron2’s visualization utilities for debugging and result analysis.
Custom Heads: While using Detectron2’s architecture, FCSGG implements custom prediction heads for relationship detection.
Scene Graph Structures: The repository defines custom data structures for scene graphs that integrate with Detectron2’s Instances
class.
Loss Functions: FCSGG implements specialized loss functions for scene graph generation while maintaining compatibility with Detectron2’s loss computation framework.
Submodule Integration: Detectron2 is included as a Git submodule, ensuring version compatibility.
Build Process: The installation process includes building Detectron2 from source to ensure proper integration.
In summary, FCSGG uses Detectron2 as its foundation, leveraging its modular architecture, data handling, training infrastructure, and configuration system while extending it with custom components for scene graph generation. This approach allows FCSGG to benefit from Detectron2’s robust implementation and optimizations while adding specialized functionality for relationship detection between objects.
Official repo:
https://github.com/liuhengyue/fcsgg
Our repo:
https://github.com/PSGBOT/KAF-Generation
My venv: fcsgg
1 | git clone git@github.com:liuhengyue/fcsgg.git |
Datasets:
1 | cd ~/Reconst |
Download the scene graphs and extract them to datasets/vg/VG-SGG-with-attri.h5
.
1 | AttributeError: module 'PIL.Image' has no attribute 'LINEAR'. Did you mean: 'BILINEAR'? |
LINEAR-> BILINEAR: commit
在尝试训练的过程中报错:
1 | File "/home/cyl/Reconst/fcsgg/fcsgg/data/detection_utils.py", line 432, in generate_score_map |
modify detection_utils.py
: commit
首先更改训练的配置文件./config/quick_schedules/Quick-FCSGG-HRNet-W32.yaml
, (原文件使用预训练的参数)
1 | MODEL: |
更改为train from scratch
1 | MODEL: |
再运行:
1 | python tools/train_net.py --num-gpus 1 --config-file configs/quick_schedules/Quick-FCSGG-HRNet-W32.yaml |
成功训练✌
1 | ... |
See [[FCSGG Repo Explanation]]
1 | pip install omegaconf |
1 | from omegaconf import OmegaConf |
You can create OmegaConf objects from multiple sources.
dict
1 | conf = OmegaConf.create({"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]}) |
list
1 | conf = OmegaConf.create([1, {"a":10, "b": {"a":10, 123: "int_key"}}]) |
yaml
1 | conf = OmegaConf.load('source/example.yaml') |
dot-list
1 | dot_list = ["a.aa.aaa=1", "a.aa.bbb=2", "a.bb.aaa=3", "a.bb.bbb=4"] |
1 | sys.argv = ['your-program.py', 'server.port=82', 'log.file=log2.txt'] |
LMOD
软件管理https://lmod.readthedocs.io/en/latest/010_user.html
交大hpc使用lmod
来管理用户软件,每一次重新登陆都会重置module,需要将软件重新load。
常用指令:
1 | ml # list these loaded modules |
oh-my-zsh
更好的shellhpc默认已安装zsh, 通过omzsh的安装脚本进行安装,并安装插件
1 | export CHSH=no # 避免它自动切换默认 shell(HPC 通常不允许) |
nvim
更好的编辑器因为在服务器上没有sudo
权限所以需要手动编译
首先使用conda创建编译环境
1 | conda create -n nvim-build cmake libtool gettext curl unzip ninja gcc gxx -y -c conda-forge |
clone & build
1 | git clone https://github.com/neovim/neovim.git |
某些情况下需要进行并行编译,编译需要耗费大量资源,并不一定可以在登录节点使用:
在这种情况下使用提交作业的方式较为低效,可以申请交互式计算资源1 | srun -p cpu -n 4 --pty /bin/bash |
1 | (base) chenyulin@pilogin2 ~> srun -p cpu -n 4 --pty /bin/bash |
使用hpc studio传输
https://studio.hpc.sjtu.edu.cn/pun/sys/dashboard/files/fs//lustre/home/acct-umjbyy/chenyulin/Data
例如拷贝conda环境
1 | cp -r /lustre/home/acct-umjbyy/chenyulin/.conda /dssg/home/acct-umjbyy/chenyulin/ |
Repository:
official:
1 | conda env create -f conda.yaml |
不建议使用official的conda.yaml
, 使用更改后的conda_cyl.yaml
。
1 | pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 |
官方提供了 depth estimation 和 segmentation 的 notebook,可以找时间理解一下
使用的数据集为Imagenet-mini
1 | imagenet-mini |
Note: 需要额外添加一个label.txt
使用脚本生产数据集的meta data:
1 |
Semantic-SAM Repository Application
My repository: https://github.com/Chen-Yulin/Semantic-SAM
My venv: ssam
测试过的python版本:3.8,3.10
官方步骤:
1 | pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113 |
根据[[Cuda+Torch]],需要先安装cudatoolkit
和cuda-toolkit
1 | conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 |
然后按照torch官网的安装指令:
1 | conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia |
第二行直接运行可能会报错,提示系统gcc版本过高,安装gcc=11.2.0
1 | conda install -c conda-forge gcc=11.2.0 |
如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status
报错,只是因为没有安装cudatoolkit
^ ^
安装完成后直接import semantic_sam
会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
^ ^
提示:
1 | Please compile MultiScaleDeformableAttention CUDA op with the following commands: |
需要手动make一下 Mask2Former
:
1 | cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ |
一些版本问题
1 | pip install gradio==3.37.0 |
1 | python demo.py --ckpt ./weights/swinl_only_sam_many2many.pth |
Comment: 效果相较于SAM更多体现了语义的一致性,而不是基于texture进行分割。
1 | python demo_auto_generation.py --ckpt ./weights/swinl_only_sam_many2many.pth |
原图:
.png)Instance identification:
Part segmentation
nvidia-smi
返回的是driver所能支持的最新的cuda版本
系统安装的cuda版本可以随意,torch会优先使用虚拟环境中安装的cuda版本
安装指定版本cuda-toolkit
1 | conda install nvidia/label/cuda-12.4.0::cuda-toolkit -c nvidia/label/cuda-12.4.0 |
安装最新版本
1 | conda install cuda-toolkit |
某些仓库需要指定cuda路径才能编译包
1 | conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/gsam/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/:$LD_LIBRARY_PATH" |
Note: 注意改变了库路径之后nvim中的lsp会报错,建议之后改回去
1 | conda env config vars set LD_LIBRARY_PATH="" |
Note: To find the correct path for CUDA_HOME
use which nvcc
. In my case, output of the command was:
1 | >>> which nvcc |
Therefore, I set the CUDA_HOME
as /home/user/miniconda3/envs/py12/
.
Note: To find the correct path for LD_LIBRARY_PATH
use find ~ -name cuda_runtime_api.h
. In my case, output of the command was:
1 | >>> find ~ -name cuda_runtime_api.h |
So I set the LD_LIBRARY_PATH
as /home/user/miniconda3/envs/py12/targets/x86_64-linux/lib/
and CPATH
as /home/user/miniconda3/envs/py12/targets/x86_64-linux/include/
. If you have multiple CUDA installations, the output of find ~ -name cuda_runtime_api.h
will display multiple paths. Make sure to choose the path that corresponds to the environment you have created.
ref:https://github.com/IDEA-Research/GroundingDINO/issues/355
Note: Always reboot the computer after the cuda is upgraded
Note: 在更改LD_LIBRARY_PATH
后可能会导致neovim的pyright无法运行,所以建议在编译完成后设回该变量
1 | conda env config vars set LD_LIBRARY_PATH="" |
cudatoolkit
和cuda-toolkit
这两个可以同时安装
如果不安装cudatoolkit
可能会在编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status
报错
使用以下指令获取版本信息
1 | python -c 'import torch;print(torch.__version__);print(torch.version.cuda)' |
1 | 2.0.0+cu117 |