Related papers: Visual Reasoning by Progressive Module Networks

Attention over learned object embeddings enables complex visual reasoning

Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 David Ding , Felix Hill , Adam Santoro , Malcolm Reynolds , Matt Botvinick

Inferring and Executing Programs for Visual Reasoning

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases…

Computer Vision and Pattern Recognition · Computer Science 2017-05-11 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Judy Hoffman , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Recent works have shown that Large Language Models (LLMs) could empower traditional neuro-symbolic models via programming capabilities to translate language into module descriptions, thus achieving strong visual reasoning results while…

Computer Vision and Pattern Recognition · Computer Science 2023-11-09 Zhenfang Chen , Rui Sun , Wenjun Liu , Yining Hong , Chuang Gan

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-01-24 David Mascharka , Philip Tran , Ryan Soklaski , Arjun Majumdar

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Ethan Perez , Harm de Vries , Florian Strub , Vincent Dumoulin , Aaron Courville

Explainable Neural Computation via Stack Neural Module Networks

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be…

Computer Vision and Pattern Recognition · Computer Science 2019-03-08 Ronghang Hu , Jacob Andreas , Trevor Darrell , Kate Saenko

Reasoning in machine vision: learning to think fast and slow

Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex and unfamiliar scenarios. In contrast, machine intelligence remains bound to training data, lacking the ability to dynamically refine solutions at…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Shaheer U. Saeed , Yipei Wang , Veeru Kasivisvanathan , Brian R. Davidson , Matthew J. Clarkson , Yipeng Hu , Daniel C. Alexander

Learning to reason over visual objects

A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated by the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Shanka Subhra Mondal , Taylor Webb , Jonathan D. Cohen

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Estimating task progress requires reasoning over long-horizon dynamics rather than recognizing static visual content. While modern Vision-Language Models (VLMs) excel at describing what is visible, it remains unclear whether they can infer…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Jianshu Zhang , Chengxuan Qian , Haosen Sun , Haoran Lu , Dingcheng Wang , Letian Xue , Han Liu

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Calvin Luo , Boqing Gong , Ting Chen , Chen Sun

GAMR: A Guided Attention Model for (visual) Reasoning

Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which…

Artificial Intelligence · Computer Science 2023-03-22 Mohit Vaishnav , Thomas Serre

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception. However, recent advances in this area are still primarily driven by…

Machine Learning · Computer Science 2020-08-27 Saeed Amizadeh , Hamid Palangi , Oleksandr Polozov , Yichen Huang , Kazuhito Koishida

Multimodal Representations for Teacher-Guided Compositional Visual Reasoning

Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce…

Computation and Language · Computer Science 2023-10-25 Wafa Aissa , Marin Ferecatu , Michel Crucianu

Neural Module Networks for Reasoning over Text

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable…

Computation and Language · Computer Science 2020-02-18 Nitish Gupta , Kevin Lin , Dan Roth , Sameer Singh , Matt Gardner

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Damien Teney , Peng Wang , Jiewei Cao , Lingqiao Liu , Chunhua Shen , Anton van den Hengel

Structure Learning for Neural Module Networks

Neural Module Networks, originally proposed for the task of visual question answering, are a class of neural network architectures that involve human-specified neural modules, each designed for a specific form of reasoning. In current…

Machine Learning · Computer Science 2019-11-11 Vardaan Pahuja , Jie Fu , Sarath Chandar , Christopher J. Pal

Towards Interpretable Reasoning over Paragraph Effects in Situation

We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the…

Computation and Language · Computer Science 2020-10-06 Mucheng Ren , Xiubo Geng , Tao Qin , Heyan Huang , Daxin Jiang

Reasoning-Modulated Representations

Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not…

Machine Learning · Computer Science 2022-12-06 Petar Veličković , Matko Bošnjak , Thomas Kipf , Alexander Lerchner , Raia Hadsell , Razvan Pascanu , Charles Blundell

Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure

We investigate how neural networks can learn and process languages with hierarchical, compositional semantics. To this end, we define the artificial task of processing nested arithmetic expressions, and study whether different types of…

Computation and Language · Computer Science 2018-04-23 Dieuwke Hupkes , Sara Veldhoen , Willem Zuidema

Neural Reasoning, Fast and Slow, for Video Question Answering

What does it take to design a machine that learns to answer natural questions about a video? A Video QA system must simultaneously understand language, represent visual content over space-time, and iteratively transform these…

Computer Vision and Pattern Recognition · Computer Science 2020-04-14 Thao Minh Le , Vuong Le , Svetha Venkatesh , Truyen Tran