RelTR= Relation Transformer for Scene Graph Generation

Iterative Scene Graph Generation

SGTR+= End-to-end Scene Graph Generation with Transformer

DETR

ViLT

Vision Transformers Need Registers

AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

DINO

Attention Is All You Need