Post

Language-Image model

Language-Image model

CLIP(Contrastive Language-Image Pre-train)

2021.01

text encoder + image encoder -> 类似 transformer 中的 Q* K

1745039122783

Virtex ?

Llava

1745067731492

1745067756588

This post is licensed under CC BY 4.0 by the author.