Tag: CV - Chen Yulin's Blog

Grounding-DINO

Posted 2025-02-16Updated 2026-03-08Reviewa minute read (About 216 words)

,

#Research-paper Transformer CV Object-Detection Open-Vocabulary Contrastive-Learning MultiModal DINO Image-Grounding

Gounded-SAM

Posted 2025-02-16Updated 2026-03-08Reviewa few seconds read (About 17 words)

https://github.com/IDEA-Research/Grounded-Segment-Anything

By [[Grounding-DINO]] + SAM
Achieving Open-Vocab. Det & Seg

#Research-paper CV Object-Detection Semantic Open-Vocabulary Segmentation

Momentum Contrast for Unsupervised Visual Representation Learning

Posted 2025-01-09Updated 2026-03-08Note5 minutes read (About 722 words)

Momentum Contrast for Unsupervised Visual Representation Learning

伪代码：

#Research-paper CV Representation-Learning Contrastive-Learning

Vision Transformers Need Registers

Posted 2025-01-09Updated 2026-03-08Notea few seconds read (About 0 words)

Vision Transformers Need Registers

#Research-paper Transformer CV ViT

DINOv2- Learning Robust Visual Features without Supervision

Posted 2025-01-09Updated 2026-03-08Notea few seconds read (About 0 words)

DINOv2- Learning Robust Visual Features without Supervision

#Research-paper Transformer CV Representation-Learning ViT DINO

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Posted 2025-01-09Updated 2026-03-08Notea few seconds read (About 71 words)

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

https://www.youtube.com/watch?v=j3VNqtJUoz0&t=16s

#Research-paper Transformer CV ViT

DINO

Posted 2025-01-08Updated 2026-03-08Note4 minutes read (About 561 words)

https://github.com/facebookresearch/dino/tree/main

#Research-paper Transformer CV Representation-Learning ViT DINO

CLIP

Posted 2025-01-06Updated 2026-03-08Notea minute read (About 197 words)

https://blog.csdn.net/h661975/article/details/135116957

#Research-paper Image2Text CV CLIP Contrastive-Learning MultiModal VLP Image-Text

LERF- Language Embedded Radiance Fields

Posted 2025-01-06Updated 2026-03-08Note5 minutes read (About 790 words)

LERF- Language Embedded Radiance Fields

NeRF+CLIP

#Research-paper LLM CV Reconstruct 3D-Scene Embodied-AI Semantic CLIP

Simple Open-Vocabulary Object Detection with Vision Transformers

Posted 2025-01-06Updated 2026-03-08Notea few seconds read (About 3 words)

Simple Open-Vocabulary Object Detection with Vision Transformers

#Research-paper Transformer CV Object-Detection Open-Vocabulary ViT