English
Related papers

Related papers: Evaluating Text-to-Image Matching using Binary Ima…

200 papers

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails…

Computation and Language · Computer Science 2016-12-26 Nan Ding , Sebastian Goodman , Fei Sha , Radu Soricut

While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of…

Computation and Language · Computer Science 2024-07-18 Brian Gordon , Yonatan Bitton , Yonatan Shafir , Roopal Garg , Xi Chen , Dani Lischinski , Daniel Cohen-Or , Idan Szpektor

Image captioning aims at automatically generating descriptions of an image in natural language. This is a challenging problem in the field of artificial intelligence that has recently received significant attention in the computer vision…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Hassan Maleki Galandouz , Mohsen Ebrahimi Moghaddam , Mehrnoush Shamsfard

Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Kunpeng Li , Yulun Zhang , Kai Li , Yuanyuan Li , Yun Fu

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of…

Computer Vision and Pattern Recognition · Computer Science 2020-01-15 Andres Mafla , Sounak Dey , Ali Furkan Biten , Lluis Gomez , Dimosthenis Karatzas

Most image-text retrieval work adopts binary labels indicating whether a pair of image and text matches or not. Such a binary indicator covers only a limited subset of image-text semantic relations, which is insufficient to represent…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Zheng Li , Caili Guo , Zerun Feng , Jenq-Neng Hwang , Ying Jin , Yufeng Zhang

Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images…

Computer Vision and Pattern Recognition · Computer Science 2019-05-15 Maxime Portaz , Hicham Randrianarivo , Adrien Nivaggioli , Estelle Maudet , Christophe Servan , Sylvain Peyronnet

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Xuelong Li , Aihong Yuan , Xiaoqiang Lu

This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are…

Information Retrieval · Computer Science 2020-02-28 Hadi Abdi Khojasteh , Ebrahim Ansari , Parvin Razzaghi , Akbar Karimi

Pretrained visual-language models have made significant advancements in multimodal tasks, including image-text retrieval. However, a major challenge in image-text matching lies in language bias, where models predominantly rely on language…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Jiwan Chung , Seungwon Lim , Sangkyu Lee , Youngjae Yu

This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Bill Psomas , Ioannis Kakogeorgiou , Nikos Efthymiadis , Giorgos Tolias , Ondrej Chum , Yannis Avrithis , Konstantinos Karantzalos

We aim at advancing blind image quality assessment (BIQA), which predicts the human perception of image quality without any reference information. We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Weixia Zhang , Guangtao Zhai , Ying Wei , Xiaokang Yang , Kede Ma

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study…

Computation and Language · Computer Science 2023-12-27 Michal Yarom , Yonatan Bitton , Soravit Changpinyo , Roee Aharoni , Jonathan Herzig , Oran Lang , Eran Ofek , Idan Szpektor

Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Jorge Calvo-Zaragoza , Antonio-Javier Gallego

This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Ji-Jia Wu , Andy Chia-Hao Chang , Chieh-Yu Chuang , Chun-Pei Chen , Yu-Lun Liu , Min-Hung Chen , Hou-Ning Hu , Yung-Yu Chuang , Yen-Yu Lin

Image classification, which classifies images by pre-defined categories, has been the dominant approach to visual representation learning over the last decade. Visual learning through image-text alignment, however, has emerged to show…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Yixuan Wei , Yue Cao , Zheng Zhang , Zhuliang Yao , Zhenda Xie , Han Hu , Baining Guo

Interacting and understanding with text heavy visual content with multiple images is a major challenge for traditional vision models. This paper is on enhancing vision models' capability to comprehend or understand and learn from images…

Computer Vision and Pattern Recognition · Computer Science 2024-08-31 Adithya TG , Adithya SK , Abhinav R Bharadwaj , Abhiram HA , Surabhi Narayan

Text-based image captioning (TextCap) which aims to read and reason images with texts is crucial for a machine to understand a detailed and complex scene environment, considering that texts are omnipresent in daily life. This task, however,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Guanghui Xu , Shuaicheng Niu , Mingkui Tan , Yucheng Luo , Qing Du , Qi Wu

Visual-Semantic Embedding (VSE) networks can help search engines better understand the meaning behind visual content and associate it with relevant textual information, leading to more accurate search results. VSE networks can be used in…

Multimedia · Computer Science 2023-11-02 Yan Gong , Georgina Cosma

The ability to describe images with natural language sentences is the hallmark for image and language understanding. Such a system has wide ranging applications such as annotating images and using natural sentences to search for images.In…

Machine Learning · Computer Science 2016-01-15 Afroze Ibrahim Baqapuri
‹ Prev 1 2 3 10 Next ›