English
Related papers

Related papers: Aligning Visual and Lexical Semantics

200 papers

The semantic gap is defined as the difference between the linguistic representations of the same concept, which usually leads to misunderstanding between individuals with different knowledge backgrounds. Since linguistically annotated…

Computer Vision and Pattern Recognition · Computer Science 2022-03-01 Xiaolei Diao

Lexical Semantics is concerned with how words encode mental representations of the world, i.e., concepts . We call this type of concepts, classification concepts . In this paper, we focus on Visual Semantics , namely on how humans build…

Artificial Intelligence · Computer Science 2021-09-15 Fausto Giunchiglia , Luca Erculiani , Andrea Passerini

The Semantic Gap Problem (SGP) in Computer Vision (CV) arises from the misalignment between visual and lexical semantics leading to flawed CV dataset design and CV benchmarks. This paper proposes that classification principles of S.R.…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Mayukh Bagchi , Fausto Giunchiglia

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Xuelong Li , Aihong Yuan , Xiaoqiang Lu

Recent advances in language modeling have witnessed the rise of highly desirable emergent capabilities, such as reasoning and in-context learning. However, vision models have yet to exhibit comparable progress in these areas. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Jike Zhong , Yuxiang Lai , Xiaofeng Yang , Konstantinos Psounis

Vision-Language Models (VLMs) leverage aligned visual encoders to transform images into visual tokens, allowing them to be processed similarly to text by the backbone large language model (LLM). This unified input paradigm enables VLMs to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Bangzheng Li , Fei Wang , Wenxuan Zhou , Nan Xu , Ben Zhou , Sheng Zhang , Hoifung Poon , Muhao Chen

Evaluating whether vision-language models (VLMs) reason consistently across representations is challenging because modality comparisons are typically confounded by task differences and asymmetric information. We introduce SEAM, a benchmark…

Artificial Intelligence · Computer Science 2025-08-26 Zhenwei Tang , Difan Jiao , Blair Yang , Ashton Anderson

While Vision Language Models (VLMs) learn conceptual representations, in the form of generalized knowledge, during training, they are typically used to analyze individual instances. When evaluation instances are atypical, this paradigm…

Computation and Language · Computer Science 2025-10-15 Stella Frank , Emily Allaway

Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an…

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object…

Computation and Language · Computer Science 2020-06-05 Goonmeet Bajaj , Bortik Bandyopadhyay , Daniel Schmidt , Pranav Maneriker , Christopher Myers , Srinivasan Parthasarathy

Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story, where the images should be realistic and keep global consistency across dynamic scenes and characters. Current works face the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-15 Bowen Li , Thomas Lukasiewicz

Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task…

Computer Vision and Pattern Recognition · Computer Science 2019-03-14 Daqi Liu , Miroslaw Bober , Josef Kittler

Image captioning attempts to generate a sentence composed of several linguistic words, which are used to describe objects, attributes, and interactions in an image, denoted as visual semantic units in this paper. Based on this view, we…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Longteng Guo , Jing Liu , Jinhui Tang , Jiangwei Li , Wei Luo , Hanqing Lu

In natural language processing, most models try to learn semantic representations merely from texts. The learned representations encode the distributional semantics but fail to connect to any knowledge about the physical world. In contrast,…

Computation and Language · Computer Science 2021-11-16 Yizhen Zhang , Minkyu Choi , Kuan Han , Zhongming Liu

Knowledge transfer, zero-shot learning and semantic image retrieval are methods that aim at improving accuracy by utilizing semantic information, e.g. from WordNet. It is assumed that this information can augment or replace missing visual…

Computer Vision and Pattern Recognition · Computer Science 2019-06-03 Clemens-Alexander Brust , Joachim Denzler

Recent work has empirically shown that Vision-Language Models (VLMs) struggle to fully understand the compositional properties of the human language, usually modeling an image caption as a "bag of words". As a result, they perform poorly on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Fiorenzo Parascandolo , Nicholas Moratelli , Enver Sangineto , Lorenzo Baraldi , Rita Cucchiara

This paper addresses the problem of semantic-based image retrieval of natural scenes. A typical content-based image retrieval system deals with the query image and images in the dataset as a collection of low-level features and retrieves a…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Yousef Alqasrawi

Vision-language models (VLMs) excel in semantic tasks but falter at a core human capability: detecting hidden content in optical illusions or AI-generated images through perceptual adjustments like zooming. We introduce HC-Bench, a…

Computation and Language · Computer Science 2025-10-16 Sifan Li , Yujun Cai , Yiwei Wang

Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to…

Computation and Language · Computer Science 2017-11-10 Éloi Zablocki , Benjamin Piwowarski , Laure Soulier , Patrick Gallinari

Humans can effortlessly describe what they see, yet establishing a shared representational format between vision and language remains a significant challenge. Emerging evidence suggests that human brain representations in both vision and…

Neurons and Cognition · Quantitative Biology 2025-07-30 Katerina Marie Simkova , Adrien Doerig , Clayton Hickey , Ian Charest
‹ Prev 1 2 3 10 Next ›