NSFC

Vision Transformers Need Registers

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

DINO

CLIP

LERF- Language Embedded Radiance Fields

Some Thoughts Regarding -Reconstruct Anything-