Related papers: Using Text to Teach Image Retrieval

Composing Text and Image for Image Retrieval - An Empirical Odyssey

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel…

Computer Vision and Pattern Recognition · Computer Science 2018-12-19 Nam Vo , Lu Jiang , Chen Sun , Kevin Murphy , Li-Jia Li , Li Fei-Fei , James Hays

Scene Graph based Image Retrieval -- A case study on the CLEVR Dataset

With the prolification of multimodal interaction in various domains, recently there has been much interest in text based image retrieval in the computer vision community. However most of the state of the art techniques model this problem in…

Artificial Intelligence · Computer Science 2019-11-05 Sahana Ramnath , Amrita Saha , Soumen Chakrabarti , Mitesh M. Khapra

Learning Joint Representations of Videos and Sentences with Web Image Search

Our objective is video retrieval based on natural language queries. In addition, we consider the analogous problem of retrieving sentences or generating descriptions given an input video. Recent work has addressed the problem by embedding…

Computer Vision and Pattern Recognition · Computer Science 2016-08-09 Mayu Otani , Yuta Nakashima , Esa Rahtu , Janne Heikkilä , Naokazu Yokoya

Scene Graph Embeddings Using Relative Similarity Supervision

Scene graphs are a powerful structured representation of the underlying content of images, and embeddings derived from them have been shown to be useful in multiple downstream tasks. In this work, we employ a graph convolutional network to…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Paridhi Maheshwari , Ritwick Chaudhry , Vishwa Vinay

Image search using multilingual texts: a cross-modal learning approach between image and text

Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images…

Computer Vision and Pattern Recognition · Computer Science 2019-05-15 Maxime Portaz , Hicham Randrianarivo , Adrien Nivaggioli , Estelle Maudet , Christophe Servan , Sylvain Peyronnet

Deep Learning Applied to Image and Text Matching

The ability to describe images with natural language sentences is the hallmark for image and language understanding. Such a system has wide ranging applications such as annotating images and using natural sentences to search for images.In…

Machine Learning · Computer Science 2016-01-15 Afroze Ibrahim Baqapuri

Intra-Modal Constraint Loss For Image-Text Retrieval

Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is…

Computer Vision and Pattern Recognition · Computer Science 2022-07-14 Jianan Chen , Lu Zhang , Qiong Wang , Cong Bai , Kidiyo Kpalma

Semantic Modeling of Textual Relationships in Cross-Modal Retrieval

Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information…

Multimedia · Computer Science 2019-06-13 Jing Yu , Chenghao Yang , Zengchang Qin , Zhuoqian Yang , Yue Hu , Weifeng Zhang

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of…

Computer Vision and Pattern Recognition · Computer Science 2020-01-15 Andres Mafla , Sounak Dey , Ali Furkan Biten , Lluis Gomez , Dimosthenis Karatzas

Representation Learning of Image Schema

Image schema is a recurrent pattern of reasoning where one entity is mapped into another. Image schema is similar to conceptual metaphor and is also related to metaphoric gesture. Our main goal is to generate metaphoric gestures for an…

Human-Computer Interaction · Computer Science 2022-07-19 Fajrian Yunus , Chloé Clavel , Catherine Pelachaud

Learning Deep Structure-Preserving Image-Text Embeddings

This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is trained using a large margin objective…

Computer Vision and Pattern Recognition · Computer Science 2016-04-15 Liwei Wang , Yin Li , Svetlana Lazebnik

Multimodal Representation Alignment for Cross-modal Information Retrieval

Different machine learning models can represent the same underlying concept in different ways. This variability is particularly valuable for in-the-wild multimodal retrieval, where the objective is to identify the corresponding…

Information Retrieval · Computer Science 2025-06-11 Fan Xu , Luis A. Leiva

Predicting Visual Features from Text for Image and Video Caption Retrieval

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do…

Computer Vision and Pattern Recognition · Computer Science 2018-07-17 Jianfeng Dong , Xirong Li , Cees G. M. Snoek

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal…

Computer Vision and Pattern Recognition · Computer Science 2019-01-09 Raul Gomez , Lluis Gomez , Jaume Gibert , Dimosthenis Karatzas

Reconstructing the Geometry of Random Geometric Graphs

Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance,…

Machine Learning · Computer Science 2026-04-10 Han Huang , Pakawut Jiradilok , Elchanan Mossel

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across…

Multimedia · Computer Science 2018-08-24 Niluthpol Chowdhury Mithun , Rameswar Panda , Evangelos E. Papalexakis , Amit K. Roy-Chowdhury

Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Clickthrough Data

Image retrieval refers to finding relevant images from an image database for a query, which is considered difficult for the gap between low-level representation of images and high-level representation of queries. Recently further developed…

Computer Vision and Pattern Recognition · Computer Science 2013-12-24 Yalong Bai , Kuiyuan Yang , Wei Yu , Wei-Ying Ma , Tiejun Zhao

Geometrically Mappable Image Features

Vision-based localization of an agent in a map is an important problem in robotics and computer vision. In that context, localization by learning matchable image features is gaining popularity due to recent advances in machine learning.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-24 Janine Thoma , Danda Pani Paudel , Ajad Chhatkuli , Luc Van Gool

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply…

Computer Vision and Pattern Recognition · Computer Science 2021-07-28 Zhedong Zheng , Liang Zheng , Michael Garrett , Yi Yang , Mingliang Xu , Yi-Dong Shen

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text. Such a task is usually realized by matching a query text to the recognized words, outputted by…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Hao Wang , Xiang Bai , Mingkun Yang , Shenggao Zhu , Jing Wang , Wenyu Liu