
CLIP
https://blog.csdn.net/h661975/article/details/135116957
loss: ITC (Image Text Contrastive)
1 | # image_encoder - ResNet or Vision Transformer |
Cross_entropy_loss:
CLIP 本质上是全局图像嵌入,不利于像素对齐特征提取。
https://blog.csdn.net/h661975/article/details/135116957
loss: ITC (Image Text Contrastive)
1 | # image_encoder - ResNet or Vision Transformer |
Cross_entropy_loss:
CLIP 本质上是全局图像嵌入,不利于像素对齐特征提取。