Tag: Transformer - Chen Yulin's Blog

SGTR+= End-to-end Scene Graph Generation with Transformer

Posted 2025-03-13Updated 2026-04-07Reviewa minute read (About 180 words)

SGTR+= End-to-end Scene Graph Generation with Transformer

SGTR 是一种自上而下的方法，该方法首先使用基于Transformer的生成器来生成一组可学习的triplet queries (subject–predicate–object)，然后使用级联的triplet detector逐步完善这些查询并生成最终场景图。它还提出了一种基于结构化发生器的实体感知关系表示方法，该方法利用了关系的组成属性。

#Scene-graph Visual-Relation Research-paper Transformer CV

DETR

Posted 2025-03-11Updated 2026-04-07Reviewa minute read (About 161 words)

DETR是一个使用transformer作为基本架构的 object detection 模型。

#Research-paper Transformer CV Object-Detection

MaskDINO

Posted 2025-03-06Updated 2026-04-07Reviewa few seconds read (About 26 words)

注：此DINO并非自蒸馏自监督的那个[[DINO]]，而是派生自[[DETR]]

#Research-paper Transformer CV Object-Detection MultiModal Semantic Segmentation

ViLT

Posted 2025-03-04Updated 2026-04-07Reviewa few seconds read (About 3 words)

#Research-paper Transformer Image2Text CV MultiModal VLP Image-Text

Grounding-DINO

Posted 2025-02-16Updated 2026-04-07Reviewa minute read (About 216 words)

,

#Research-paper Transformer CV Object-Detection MultiModal Open-Vocabulary Contrastive-Learning DINO Image-Grounding

Vision Transformers Need Registers

Posted 2025-01-09Updated 2026-04-07Notea few seconds read (About 0 words)

Vision Transformers Need Registers

#Research-paper Transformer CV ViT

DINOv2- Learning Robust Visual Features without Supervision

Posted 2025-01-09Updated 2026-04-07Notea few seconds read (About 0 words)

DINOv2- Learning Robust Visual Features without Supervision

#Research-paper Transformer CV Representation-Learning ViT DINO

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Posted 2025-01-09Updated 2026-04-07Notea few seconds read (About 71 words)

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

https://www.youtube.com/watch?v=j3VNqtJUoz0&t=16s

#Research-paper Transformer CV ViT

DINO

Posted 2025-01-08Updated 2026-04-07Note4 minutes read (About 561 words)

https://github.com/facebookresearch/dino/tree/main

#Research-paper Transformer CV Representation-Learning ViT DINO

Simple Open-Vocabulary Object Detection with Vision Transformers

Posted 2025-01-06Updated 2026-04-07Notea few seconds read (About 3 words)

Simple Open-Vocabulary Object Detection with Vision Transformers

#Research-paper Transformer CV Object-Detection Open-Vocabulary ViT