2025-01-09
Momentum Contrast for Unsupervised Visual Representation Learning
Note
Vision Transformers Need Registers
DINOv2- Learning Robust Visual Features without Supervision
AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
2025-01-08
DINO
2025-01-06
CLIP
LERF- Language Embedded Radiance Fields
Some Thoughts Regarding -Reconstruct Anything-
CLIP-Fields- Weakly Supervised Semantic Fields for Robotic Memory
Simple Open-Vocabulary Object Detection with Vision Transformers
Chen Yulin
SJTU student
Manchester by the Sea
Posts
180
Categories
8
Tags
178