Hyperbolic Image-Text Representations

Karan Desai; Maximilian Nickel; Tanmay Rajpurohit; Justin Johnson; Ramakrishna Vedantam

Hyperbolic Image-Text Representations

Computer Vision and Pattern Recognition 2024-01-19 v3 Machine Learning

Authors: Karan Desai , Maximilian Nickel , Tanmay Rajpurohit , Justin Johnson , Ramakrishna Vedantam

Abstract

Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP's performance on standard multi-modal tasks like image classification and image-text retrieval. Our code and models are available at https://www.github.com/facebookresearch/meru

Keywords

hyperbolic embeddings image retrieval hierarchical classification

Cite

@article{arxiv.2304.09172,
  title  = {Hyperbolic Image-Text Representations},
  author = {Karan Desai and Maximilian Nickel and Tanmay Rajpurohit and Justin Johnson and Ramakrishna Vedantam},
  journal= {arXiv preprint arXiv:2304.09172},
  year   = {2024}
}

Comments

ICML 2023 (v3: Add link to code in abstract)

Hyperbolic Image-Text Representations

Abstract

Keywords

Cite

Comments

Related papers