3D-LLM

PointLLM

ProgPrompt

《悉达多》 读书会p14

《悉达多》 读书会p14

Vision Transformers Need Registers

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

DINO

CLIP