Related papers: Evaluating Text-to-Image Matching using Binary Ima…

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails…

Computation and Language · Computer Science 2016-12-26 Nan Ding , Sebastian Goodman , Fei Sha , Radu Soricut

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of…

Computation and Language · Computer Science 2024-07-18 Brian Gordon , Yonatan Bitton , Yonatan Shafir , Roopal Garg , Xi Chen , Dani Lischinski , Daniel Cohen-Or , Idan Szpektor

A Weighted Multi-Criteria Decision Making Approach for Image Captioning

Image captioning aims at automatically generating descriptions of an image in natural language. This is a challenging problem in the field of artificial intelligence that has recently received significant attention in the computer vision…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Hassan Maleki Galandouz , Mohsen Ebrahimi Moghaddam , Mehrnoush Shamsfard

Visual Semantic Reasoning for Image-Text Matching

Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Kunpeng Li , Yulun Zhang , Kai Li , Yuanyuan Li , Yun Fu

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of…

Computer Vision and Pattern Recognition · Computer Science 2020-01-15 Andres Mafla , Sounak Dey , Ali Furkan Biten , Lluis Gomez , Dimosthenis Karatzas

Image-Text Retrieval with Binary and Continuous Label Supervision

Most image-text retrieval work adopts binary labels indicating whether a pair of image and text matches or not. Such a binary indicator covers only a limited subset of image-text semantic relations, which is insufficient to represent…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Zheng Li , Caili Guo , Zerun Feng , Jenq-Neng Hwang , Ying Jin , Yufeng Zhang

Image search using multilingual texts: a cross-modal learning approach between image and text

Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images…

Computer Vision and Pattern Recognition · Computer Science 2019-05-15 Maxime Portaz , Hicham Randrianarivo , Adrien Nivaggioli , Estelle Maudet , Christophe Servan , Sylvain Peyronnet

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Xuelong Li , Aihong Yuan , Xiaoqiang Lu

Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval

This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are…

Information Retrieval · Computer Science 2020-02-28 Hadi Abdi Khojasteh , Ebrahim Ansari , Parvin Razzaghi , Akbar Karimi

MASS: Overcoming Language Bias in Image-Text Matching

Pretrained visual-language models have made significant advancements in multimodal tasks, including image-text retrieval. However, a major challenge in image-text matching lies in language bias, where models predominantly rely on language…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Jiwan Chung , Seungwon Lim , Sangkyu Lee , Youngjae Yu

Composed Image Retrieval for Remote Sensing

This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Bill Psomas , Ioannis Kakogeorgiou , Nikos Efthymiadis , Giorgos Tolias , Ondrej Chum , Yannis Avrithis , Konstantinos Karantzalos

Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

We aim at advancing blind image quality assessment (BIQA), which predicts the human perception of image quality without any reference information. We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Weixia Zhang , Guangtao Zhai , Ying Wei , Xiaokang Yang , Kede Ma

What You See is What You Read? Improving Text-Image Alignment Evaluation

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study…

Computation and Language · Computer Science 2023-12-27 Michal Yarom , Yonatan Bitton , Soravit Changpinyo , Roee Aharoni , Jonathan Herzig , Oran Lang , Eran Ofek , Idan Szpektor

A selectional auto-encoder approach for document image binarization

Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Jorge Calvo-Zaragoza , Antonio-Javier Gallego

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Ji-Jia Wu , Andy Chia-Hao Chang , Chieh-Yu Chuang , Chun-Pei Chen , Yu-Lun Liu , Min-Hung Chen , Hou-Ning Hu , Yung-Yu Chuang , Yen-Yu Lin

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

Image classification, which classifies images by pre-defined categories, has been the dominant approach to visual representation learning over the last decade. Visual learning through image-text alignment, however, has emerged to show…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Yixuan Wei , Yue Cao , Zheng Zhang , Zhuliang Yao , Zhenda Xie , Han Hu , Baining Guo

Enhancing Vision Models for Text-Heavy Content Understanding and Interaction

Interacting and understanding with text heavy visual content with multiple images is a major challenge for traditional vision models. This paper is on enhancing vision models' capability to comprehend or understand and learn from images…

Computer Vision and Pattern Recognition · Computer Science 2024-08-31 Adithya TG , Adithya SK , Abhinav R Bharadwaj , Abhiram HA , Surabhi Narayan

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

Text-based image captioning (TextCap) which aims to read and reason images with texts is crucial for a machine to understand a detailed and complex scene environment, considering that texts are omnipresent in daily life. This task, however,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Guanghui Xu , Shuaicheng Niu , Mingkui Tan , Yucheng Luo , Qing Du , Qi Wu

Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Visual-Semantic Embedding (VSE) networks can help search engines better understand the meaning behind visual content and associate it with relevant textual information, leading to more accurate search results. VSE networks can be used in…

Multimedia · Computer Science 2023-11-02 Yan Gong , Georgina Cosma

Deep Learning Applied to Image and Text Matching

The ability to describe images with natural language sentences is the hallmark for image and language understanding. Such a system has wide ranging applications such as annotating images and using natural sentences to search for images.In…

Machine Learning · Computer Science 2016-01-15 Afroze Ibrahim Baqapuri