Chen Yulin's Blog

Posted 2025-03-11Updated 2026-02-15Reviewa minute read (About 161 words)

DETR

DETR是一个使用transformer作为基本架构的 object detection 模型。

#Research-paper CV Transformer Object-Detection

Posted 2025-03-06Updated 2026-02-15Reviewa minute read (About 206 words)

Semantic-SAM

这片文章可以成为场景物理重建的基石之一
类似的后续工作有OMG-Seg

#Research-paper CV Semantic Segmentation Hierarchical

Posted 2025-03-06Updated 2026-02-15Reviewa few seconds read (About 26 words)

MaskDINO

注：此DINO并非自蒸馏自监督的那个[[DINO]]，而是派生自[[DETR]]

#Research-paper CV Transformer Object-Detection Semantic Segmentation MultiModal

Posted 2025-03-04Updated 2026-02-15Reviewa minute read (About 154 words)

ALBEF

使用的backbone是BERT(通过MLM训练)
该研究认为，image encoder的模型大小应该大于text encoder,所以在text encoder这里，只使用六层self attention来提取特征，剩余六层cross attention用于multi-modal encoder。

#Research-paper CV Image2Text Contrastive-Learning MultiModal VLP Image-Text

Posted 2025-03-04Updated 2026-02-15Reviewa few seconds read (About 3 words)

ViLT

#Research-paper CV Transformer Image2Text MultiModal VLP Image-Text

Posted 2025-03-03Updated 2026-02-15Reviewa few seconds read (About 0 words)

ZegCLIP

#Research-paper CV Semantic CLIP Open-Vocabulary Segmentation

Posted 2025-03-03Updated 2026-02-15Reviewa few seconds read (About 108 words)

BLIP

A vision-language model that unifies vision-language understanding and generation tasks.

#Research-paper CV Multi-modal Semantic CLIP VLP Image-Text

Posted 2025-02-19Updated 2026-02-15Review2 minutes read (About 273 words)

GLIP

GLIP是一个学习了object-level, language-aware, and semantic-rich visual representations 的模型。
统一对象检测和短语接地进行预训练。

#Research-paper CV Multi-modal Object-Detection CLIP Contrastive-Learning VLP Image-Grounding

Posted 2025-02-18Updated 2026-02-15Reviewa few seconds read (About 0 words)

Extract Free Dense Labels from CLIP

#Research-paper CV Semantic CLIP Open-Vocabulary Segmentation

Posted 2025-02-17Updated 2026-02-15Review2 minutes read (About 297 words)

ConceptFusion

将不同帧$X_t$中的特征集合在M中特征点的公式：

#Research-paper CV Multi-modal Reconstruct 3D-Scene Semantic CLIP

Archives

Recents

Tags