Related papers: Inferring and Executing Programs for Visual Reason…

Benchmark Visual Question Answer Models by using Focus Map

Inferring and Executing Programs for Visual Reasoning proposes a model for visual reasoning that consists of a program generator and an execution engine to avoid end-to-end models. To show that the model actually learns which objects to…

Computer Vision and Pattern Recognition · Computer Science 2018-01-17 Wenda Qiu , Yueyang Xianzang , Zhekai Zhang

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However,…

Machine Learning · Computer Science 2022-03-01 Spyridon Mouselinos , Henryk Michalewski , Mateusz Malinowski

Visual Reasoning by Progressive Module Networks

Humans learn to solve tasks of increasing complexity by building on top of previously acquired knowledge. Typically, there exists a natural progression in the tasks that we learn - most do not require completely independent solutions, but…

Computer Vision and Pattern Recognition · Computer Science 2018-10-01 Seung Wook Kim , Makarand Tapaswi , Sanja Fidler

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-01-24 David Mascharka , Philip Tran , Ryan Soklaski , Arjun Majumdar

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Ethan Perez , Harm de Vries , Florian Strub , Vincent Dumoulin , Aaron Courville

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a…

Artificial Intelligence · Computer Science 2019-01-16 Kexin Yi , Jiajun Wu , Chuang Gan , Antonio Torralba , Pushmeet Kohli , Joshua B. Tenenbaum

ViperGPT: Visual Inference via Python Execution for Reasoning

Answering visual queries is a complex task that requires both visual processing and reasoning. End-to-end models, the dominant approach for this task, do not explicitly differentiate between the two, limiting interpretability and…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Dídac Surís , Sachit Menon , Carl Vondrick

RECODE: Reasoning Through Code Generation for Visual Question Answering

Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering --…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Junhong Shen , Mu Cai , Bo Hu , Ameet Talwalkar , David A Ross , Cordelia Schmid , Alireza Fathi

Latent Visual Reasoning

Multimodal Large Language Models (MLLMs) have achieved notable gains in various tasks by incorporating Chain-of-Thought (CoT) reasoning in language spaces. Recent work extends this direction by leveraging external tools for visual editing,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bangzheng Li , Ximeng Sun , Jiang Liu , Ze Wang , Jialian Wu , Xiaodong Yu , Hao Chen , Emad Barsoum , Muhao Chen , Zicheng Liu

Explainable and Explicit Visual Reasoning over Scene Graphs

We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using…

Computer Vision and Pattern Recognition · Computer Science 2019-03-20 Jiaxin Shi , Hanwang Zhang , Juanzi Li

VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

While Large Language Models (LLMs) excel at reasoning on text and Vision-Language Models (VLMs) are highly effective for visual perception, applying those models for visual instruction-based planning remains a widely open problem. In this…

Machine Learning · Computer Science 2025-09-11 Mohamed Salim Aissi , Clemence Grislain , Mohamed Chetouani , Olivier Sigaud , Laure Soulier , Nicolas Thome

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help,…

Computer Vision and Pattern Recognition · Computer Science 2016-12-22 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

Improving Visual Reasoning with Iterative Evidence Refinement

Vision language models (VLMs) are increasingly capable of reasoning over images, but robust visual reasoning often requires re-grounding intermediate steps in the underlying visual evidence. Recent approaches typically rely on external…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Zeru Shi , Kai Mei , Yihao Quan , Dimitris N. Metaxas , Ruixiang Tang

From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning

Recent advances in vision-language reasoning underscore the importance of thinking with images, where models actively ground their reasoning in visual evidence. Yet, prevailing frameworks treat visual actions as optional tools, boosting…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Changpeng Wang , Haozhe Wang , Xi Chen , Junhan Liu , Taofeng Xue , Chong Peng , Donglian Qi , Fangzhen Lin , Yunfeng Yan

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Damien Teney , Peng Wang , Jiewei Cao , Lingqiao Liu , Chunhua Shen , Anton van den Hengel

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Calvin Luo , Boqing Gong , Ting Chen , Chen Sun

Learning to reason over visual objects

A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated by the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Shanka Subhra Mondal , Taylor Webb , Jonathan D. Cohen

Learning Differentiable Logic Programs for Abstract Visual Reasoning

Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine…

Machine Learning · Computer Science 2025-07-08 Hikaru Shindo , Viktor Pfanschilling , Devendra Singh Dhami , Kristian Kersting

VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework

Visual reasoning refers to the task of solving questions about visual information. Current visual reasoning methods typically employ pre-trained vision-language model (VLM) strategies or deep neural network approaches. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Chao Wang , Chunbai Zhang , Yongxiao Tian , Yang Zhou , Yan Peng

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in…

Computer Vision and Pattern Recognition · Computer Science 2018-03-26 Somak Aditya , Yezhou Yang , Chitta Baral