English
Related papers

Related papers: Visual Set Program Synthesizer

200 papers

Vision-Language models (VLMs) achieve strong performance on multimodal tasks but often fail at systematic visual reasoning tasks, leading to inconsistent or illogical outputs. Neuro-symbolic methods promise to address this by inducing…

Artificial Intelligence · Computer Science 2025-11-25 Antonia Wüst , Wolfgang Stammer , Hikaru Shindo , Lukas Helff , Devendra Singh Dhami , Kristian Kersting

We explore multi-step reasoning in vision-language models (VLMs). The problem is challenging, as reasoning data consisting of multiple steps of visual and language processing are barely available. To overcome the challenge, we first…

Computation and Language · Computer Science 2024-10-14 Chuanqi Cheng , Jian Guan , Wei Wu , Rui Yan

Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task given in the form of a natural language. As the solutions inherently require to combine visual and natural…

Computer Vision and Pattern Recognition · Computer Science 2018-01-31 Mikyas T. Desta , Larry Chen , Tomasz Kornuta

Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception. However, recent advances in this area are still primarily driven by…

Machine Learning · Computer Science 2020-08-27 Saeed Amizadeh , Hamid Palangi , Oleksandr Polozov , Yichen Huang , Kazuhito Koishida

Solving complex visual tasks such as "Who invented the musical instrument on the right?" involves a composition of skills: understanding space, recognizing instruments, and also retrieving prior knowledge. Recent work shows promise by…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Yushi Hu , Otilia Stretcu , Chun-Ta Lu , Krishnamurthy Viswanathan , Kenji Hata , Enming Luo , Ranjay Krishna , Ariel Fuxman

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene…

Computation and Language · Computer Science 2019-07-12 Drew A. Hudson , Christopher D. Manning

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in…

Computer Vision and Pattern Recognition · Computer Science 2018-03-26 Somak Aditya , Yezhou Yang , Chitta Baral

Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Aleksandar Stanić , Sergi Caelles , Michael Tschannen

Vision-language models (VLMs) lag behind text-only language models on mathematical reasoning when the same problems are presented as images rather than text. We empirically characterize this as a modality gap: the same question in text form…

Artificial Intelligence · Computer Science 2026-03-18 Saeed Khaki , Ashudeep Singh , Nima Safaei , Kamal Ginotra

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Damien Teney , Peng Wang , Jiewei Cao , Lingqiao Liu , Chunhua Shen , Anton van den Hengel

Despite recent advances in Vision-Language Models (VLMs), they may over-rely on visual language priors existing in their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Tiange Luo , Ang Cao , Gunhee Lee , Justin Johnson , Honglak Lee

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Shailaja Keyur Sampat , Yezhou Yang , Chitta Baral

In this paper we present an approach and a benchmark for visual reasoning in robotics applications, in particular small object grasping and manipulation. The approach and benchmark are focused on inferring object properties from visual and…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Michal Nazarczuk , Krystian Mikolajczyk

When answering questions about an image, it not only needs knowing what -- understanding the fine-grained contents (e.g., objects, relationships) in the image, but also telling why -- reasoning over grounding visual cues to derive the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Jianwei Yang , Jiayuan Mao , Jiajun Wu , Devi Parikh , David D. Cox , Joshua B. Tenenbaum , Chuang Gan

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases…

Computer Vision and Pattern Recognition · Computer Science 2017-05-11 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Judy Hoffman , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for…

Visual Question Answering (VQA) is a challenging problem that requires to process multimodal input. Answer-Set Programming (ASP) has shown great potential in this regard to add interpretability and explainability to modular VQA…

Artificial Intelligence · Computer Science 2025-02-14 Jakob Johannes Bauer , Thomas Eiter , Nelson Higuera Ruiz , Johannes Oetsch

Recent advances in Vision-Language Models (VLMs) have achieved impressive progress in multimodal mathematical reasoning. Yet, how much visual information truly contributes to reasoning remains unclear. Existing benchmarks report strong…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Yuandong Wang , Yao Cui , Yuxin Zhao , Zhen Yang , Yangfu Zhu , Zhenzhou Shao

The impressive advances and applications of large language and joint language-and-visual understanding models has led to an increased need for methods of probing their potential reasoning capabilities. However, the difficulty of gather…

Machine Learning · Computer Science 2023-06-05 Nathan Vaska , Victoria Helus

In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science domains. This…

Computer Vision and Pattern Recognition · Computer Science 2026-05-04 Jiacong Wang , Zijian Kang , Haochen Wang , Haiyong Jiang , Jiawen Li , Bohong Wu , Ya Wang , Jiao Ran , Xiao Liang , Chao Feng , Jun Xiao
‹ Prev 1 2 3 10 Next ›