
Download pkg from https://apps.cloud.blackmagicdesign.com/davinci-resolve
1 | git clone https://aur.archlinux.org/davinci-resolve.git |
Download pkg from https://apps.cloud.blackmagicdesign.com/davinci-resolve
1 | git clone https://aur.archlinux.org/davinci-resolve.git |
FCSGG (Fully Convolutional Scene Graph Generation) is a PyTorch implementation of the paper “Fully Convolutional Scene Graph Generation” published in CVPR 2021. The project focuses on scene graph generation, which is the task of detecting objects in an image and identifying the relationships between them.
Architecture:
Key Features:
Dataset:
Model Components:
Utilities:
fcsgg/: Main module containing model implementation
configs/: Configuration files for different model variants
tools/: Training, evaluation, and visualization scripts
GraphViz/: Visualization tools for scene graphs
The project implements a fully convolutional approach to scene graph generation, which differs from traditional two-stage methods. Instead of first detecting objects and then predicting relationships, it uses a one-stage detector to simultaneously predict objects and their relationships in a fully convolutional manner.
The repository provides several pre-trained models with different backbones:
These models achieve competitive performance on the Visual Genome dataset for scene graph generation tasks.
The project provides tools for training, evaluation, and visualization of scene graphs. It requires the Visual Genome dataset and can be run using Docker or directly with PyTorch.
In summary, FCSGG is a comprehensive implementation of a state-of-the-art approach to scene graph generation using fully convolutional networks, offering various model architectures and training configurations.
FCSGG is built on top of Detectron2, Facebook’s object detection framework, and leverages many of its components while extending it for scene graph generation. Here’s a detailed breakdown:
Meta Architecture: FCSGG registers a custom meta architecture called “CenterNet” with Detectron2’s META_ARCH_REGISTRY
. This extends Detectron2’s modular architecture system while maintaining compatibility.
Backbone Networks: FCSGG uses Detectron2’s backbone networks (ResNet, etc.) directly and also implements custom backbones like HRNet while following Detectron2’s backbone interface.
Feature Pyramid Networks (FPN): The repository uses Detectron2’s FPN implementation and extends it with custom variants like BiFPN and HRFPN.
YAML Configuration: FCSGG adopts Detectron2’s YAML-based configuration system, extending it with custom configurations for scene graph generation through add_fcsgg_config()
.
Command Line Arguments: The training script uses Detectron2’s default_argument_parser()
to maintain the same command-line interface.
Dataset Registration: Visual Genome dataset is registered with Detectron2’s DatasetCatalog
and MetadataCatalog
, making it available through Detectron2’s data loading pipeline.
Custom Dataset Mapper: FCSGG implements a custom DatasetMapper
class that extends Detectron2’s mapper to handle scene graph annotations.
Data Loaders: The repository uses Detectron2’s build_detection_train_loader
and build_detection_test_loader
with custom mappers.
Trainer Class: FCSGG extends Detectron2’s DefaultTrainer
class to customize the training loop, evaluation metrics, and data loading.
Checkpointing: The repository uses Detectron2’s DetectionCheckpointer
for model saving and loading.
Distributed Training: FCSGG leverages Detectron2’s distributed training utilities through detectron2.utils.comm
and the launch
function.
Custom Evaluators: The repository implements a custom VGEvaluator
for scene graph evaluation while following Detectron2’s evaluator interface.
Event Storage: FCSGG uses Detectron2’s event storage system for logging metrics during training.
Visualization Tools: The repository leverages Detectron2’s visualization utilities for debugging and result analysis.
Custom Heads: While using Detectron2’s architecture, FCSGG implements custom prediction heads for relationship detection.
Scene Graph Structures: The repository defines custom data structures for scene graphs that integrate with Detectron2’s Instances
class.
Loss Functions: FCSGG implements specialized loss functions for scene graph generation while maintaining compatibility with Detectron2’s loss computation framework.
Submodule Integration: Detectron2 is included as a Git submodule, ensuring version compatibility.
Build Process: The installation process includes building Detectron2 from source to ensure proper integration.
In summary, FCSGG uses Detectron2 as its foundation, leveraging its modular architecture, data handling, training infrastructure, and configuration system while extending it with custom components for scene graph generation. This approach allows FCSGG to benefit from Detectron2’s robust implementation and optimizations while adding specialized functionality for relationship detection between objects.
Official repo:
https://github.com/liuhengyue/fcsgg
Our repo:
https://github.com/PSGBOT/KAF-Generation
My venv: fcsgg
1 | git clone git@github.com:liuhengyue/fcsgg.git |
Datasets:
1 | cd ~/Reconst |
Download the scene graphs and extract them to datasets/vg/VG-SGG-with-attri.h5
.
1 | AttributeError: module 'PIL.Image' has no attribute 'LINEAR'. Did you mean: 'BILINEAR'? |
LINEAR-> BILINEAR: commit
在尝试训练的过程中报错:
1 | File "/home/cyl/Reconst/fcsgg/fcsgg/data/detection_utils.py", line 432, in generate_score_map |
modify detection_utils.py
: commit
首先更改训练的配置文件./config/quick_schedules/Quick-FCSGG-HRNet-W32.yaml
, (原文件使用预训练的参数)
1 | MODEL: |
更改为train from scratch
1 | MODEL: |
再运行:
1 | python tools/train_net.py --num-gpus 1 --config-file configs/quick_schedules/Quick-FCSGG-HRNet-W32.yaml |
成功训练✌
1 | ... |
See [[FCSGG Repo Explanation]]
(Mindmap) Part-level Scene Understanding for Robots
A scene graph is a structural representation, which can capture detailed semantics by explicitly Modeling:
A scene graph is a set of visual relationship triplets in the form of <subject, relation, object> or <object, is, attribute>
Scene graphs should serve as an **objective semantic representation** of the state of the scene1 | pip install omegaconf |
1 | from omegaconf import OmegaConf |
You can create OmegaConf objects from multiple sources.
dict
1 | conf = OmegaConf.create({"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]}) |
list
1 | conf = OmegaConf.create([1, {"a":10, "b": {"a":10, 123: "int_key"}}]) |
yaml
1 | conf = OmegaConf.load('source/example.yaml') |
dot-list
1 | dot_list = ["a.aa.aaa=1", "a.aa.bbb=2", "a.bb.aaa=3", "a.bb.bbb=4"] |
1 | sys.argv = ['your-program.py', 'server.port=82', 'log.file=log2.txt'] |
LMOD
软件管理https://lmod.readthedocs.io/en/latest/010_user.html
交大hpc使用lmod
来管理用户软件,每一次重新登陆都会重置module,需要将软件重新load。
常用指令:
1 | ml # list these loaded modules |
oh-my-zsh
更好的shellhpc默认已安装zsh, 通过omzsh的安装脚本进行安装,并安装插件
1 | export CHSH=no # 避免它自动切换默认 shell(HPC 通常不允许) |
nvim
更好的编辑器因为在服务器上没有sudo
权限所以需要手动编译
首先使用conda创建编译环境
1 | conda create -n nvim-build cmake libtool gettext curl unzip ninja gcc gxx -y -c conda-forge |
clone & build
1 | git clone https://github.com/neovim/neovim.git |
某些情况下需要进行并行编译,编译需要耗费大量资源,并不一定可以在登录节点使用:
在这种情况下使用提交作业的方式较为低效,可以申请交互式计算资源1 | srun -p cpu -n 4 --pty /bin/bash |
1 | (base) chenyulin@pilogin2 ~> srun -p cpu -n 4 --pty /bin/bash |
使用hpc studio传输
https://studio.hpc.sjtu.edu.cn/pun/sys/dashboard/files/fs//lustre/home/acct-umjbyy/chenyulin/Data
例如拷贝conda环境
1 | cp -r /lustre/home/acct-umjbyy/chenyulin/.conda /dssg/home/acct-umjbyy/chenyulin/ |
“涌现”(Emergence)最早起源于哲学领域。19世纪,英国哲学家乔治·亨利·刘易斯次用这个词来描述那些无法通过组成部分的性质解释的整体特性,即“整体大于部分之和”的现象。
将一切都归结为简单基本规律的能力并不意味着有能力从这些规律出发,重建宇宙。在面对规模和复杂度的双重困难时,还原主义假说便崩溃了。在每个复杂度层次上,都会出现全新的性质。心理学不是应用生物学,生物学也不是应用化学。我们现在可以看到,整体不仅变得更多,而且与部分的总和也非常不同。
系统科学家穆雷·盖尔曼和斯图尔特·考夫曼通过对复杂系统中自组织行为的研究,进一步发展了“涌现”的概念。他们认为,“涌现”指的是在复杂系统内,简单组件通过相互作用自然产生的某些特性或现象。这意味着,整体所展现的特征或行为不能简单地从其组成部分的性质中推导出来,而是在这些部分的相互作用中自发产生的。涌现现象侧重于解释复杂系统如何在没有外部指令或中央控制的情况下,通过系统内部的简单规则和相互作用,形成新的有序结构和行为。这表明,复杂性可以自然地产生,而不需要外部干预或预先的详细设计。
在这一方面最具代表性的就是生命游戏:
在生命游戏中:
关于文本自回归模型chatgpt参数提升带来的语言理解、推理、编程、创意生成等新能力的涌现我这里就不过多赘述了,我主要侧重分享一下在自监督视觉领域的涌现现象。
DINO(Self-Distillation with No Labels)是一种自监督学习方法,用于无标签学习视觉特征。它基于知识蒸馏(Distillation)和对比学习的思想,使用两组权重共享的网络(教师网络和学生网络)进行训练。
这个模型的思想是使用学生模型来预测教师模型生成的label,而教师模型的权重则使用学生模型的权重更新。其中学生们模型输入的图像是原始照片中的局部小区域的裁剪,而教师模型的输入是更大更全局的裁剪。 这种自监督的方式将数据集从原先有标注的数据集的百万规模扩展到了无标注网络图片的数亿规模。 在经过训练之后,这个模型能够实现那些起初并没有详细地针对性设计的功能 - 可以在没有做任何微调或者添加映射层的基础上在nearest neighbors classifier (k-NN)达到top-1准确率。 - 其次,在模型的最后一层self-attention layer上可以观察到显著的语义分割信息 ## 体现的自然辩证法规律“量变产生质变” 在 DINO 这一自监督学习方法中,**“量变产生质变”**的自然辩证法规律得到了显著体现。该原理指出,**事物的质变并非突发的,而是由量变的积累达到临界点后发生的飞跃**。在 DINO 中,这一过程主要体现在以下几个方面: - **训练数据规模的扩展促成能力涌现** - 传统的有监督学习依赖人工标注的数据集,规模通常在百万级别,而 DINO 通过自监督学习,使训练数据扩展到了无标注的海量网络图片,规模达到数亿。 - 这一量的积累,使模型在没有人工监督的情况下,也能自发学会区分物体类别,最终形成具有良好泛化能力的特征表示。 - 特征学习的逐步优化推动了语义信息涌现 - 训练初期,DINO 学到的特征较为混乱,仅能捕捉低级视觉模式,如边缘和颜色分布。 - 随着训练的持续,网络的特征表达能力逐步提升,最终在 self-attention 层中涌现出**显式的语义分割能力**,尽管分割能力并非最初的训练目标。 - 教师-学生架构中不断积累的信息对齐,并通过自蒸馏的方式不断优化自身,局部的优化(量变)最终促成了全局的能力提升(质变)。