(UVtransE) Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

(UVtransE) Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Visual Translation Embedding Network for Visual Relation Detection
ALBEF

ALBEF

ViLT

ViLT

CLIP

CLIP