Related papers: Attention over learned object embeddings enables c…

Modeling Latent Attention Within Neural Networks

Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to…

Artificial Intelligence · Computer Science 2018-01-03 Christopher Grimm , Dilip Arumugam , Siddharth Karamcheti , David Abel , Lawson L. S. Wong , Michael L. Littman

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of…

Computer Vision and Pattern Recognition · Computer Science 2024-02-21 Guillermo Puebla , Jeffrey S. Bowers

Visual Reasoning by Progressive Module Networks

Humans learn to solve tasks of increasing complexity by building on top of previously acquired knowledge. Typically, there exists a natural progression in the tasks that we learn - most do not require completely independent solutions, but…

Computer Vision and Pattern Recognition · Computer Science 2018-10-01 Seung Wook Kim , Makarand Tapaswi , Sanja Fidler

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Calvin Luo , Boqing Gong , Ting Chen , Chen Sun

Deep Neural Networks for Visual Reasoning

Visual perception and language understanding are - fundamental components of human intelligence, enabling them to understand and reason about objects and their interactions. It is crucial for machines to have this capacity to reason using…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Thao Minh Le

The role of object-centric representations, guided attention, and external memory on generalizing visual relations

Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-17 Guillermo Puebla , Jeffrey S. Bowers

A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics

Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language…

Robotics · Computer Science 2025-11-03 Simindokht Jahangard , Mehrzad Mohammadi , Abhinav Dhall , Hamid Rezatofighi

Understanding the computational demands underlying visual reasoning

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Mohit Vaishnav , Remi Cadene , Andrea Alamia , Drew Linsley , Rufin VanRullen , Thomas Serre

Neural Module Networks for Reasoning over Text

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable…

Computation and Language · Computer Science 2020-02-18 Nitish Gupta , Kevin Lin , Dan Roth , Sameer Singh , Matt Gardner

Reasoning in machine vision: learning to think fast and slow

Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex and unfamiliar scenarios. In contrast, machine intelligence remains bound to training data, lacking the ability to dynamically refine solutions at…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Shaheer U. Saeed , Yipei Wang , Veeru Kasivisvanathan , Brian R. Davidson , Matthew J. Clarkson , Yipeng Hu , Daniel C. Alexander

A Neural Network Model of Spatial and Feature-Based Attention

Visual attention is a mechanism closely intertwined with vision and memory. Top-down information influences visual processing through attention. We designed a neural network model inspired by aspects of human visual attention. This model…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Ruoyang Hu , Robert A. Jacobs

Sequential Coordination of Deep Models for Learning Visual Arithmetic

Achieving machine intelligence requires a smooth integration of perception and reasoning, yet models developed to date tend to specialize in one or the other; sophisticated manipulation of symbols acquired from rich perceptual spaces has so…

Machine Learning · Computer Science 2018-09-14 Eric Crawford , Guillaume Rabusseau , Joelle Pineau

Self-supervised Spatial Reasoning on Multi-View Line Drawings

Spatial reasoning on multi-view line drawings by state-of-the-art supervised deep networks is recently shown with puzzling low performances on the SPARE3D dataset. Based on the fact that self-supervised learning is helpful when a large…

Computer Vision and Pattern Recognition · Computer Science 2022-05-17 Siyuan Xiang , Anbang Yang , Yanfei Xue , Yaoqing Yang , Chen Feng

Multimodal Representations for Teacher-Guided Compositional Visual Reasoning

Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce…

Computation and Language · Computer Science 2023-10-25 Wafa Aissa , Marin Ferecatu , Michel Crucianu

Universal Representations: A Unified Look at Multiple Task and Domain Learning

We propose a unified look at jointly learning multiple vision tasks and visual domains through universal representations, a single deep neural network. Learning multiple problems simultaneously involves minimizing a weighted sum of multiple…

Computer Vision and Pattern Recognition · Computer Science 2022-08-31 Wei-Hong Li , Xialei Liu , Hakan Bilen

Understanding top-down attention using task-oriented ablation design

Top-down attention allows neural networks, both artificial and biological, to focus on the information most relevant for a given task. This is known to enhance performance in visual perception. But it remains unclear how attention brings…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Freddie Bickford Smith , Brett D Roads , Xiaoliang Luo , Bradley C Love

A Useful Motif for Flexible Task Learning in an Embodied Two-Dimensional Visual Environment

Animals (especially humans) have an amazing ability to learn new tasks quickly, and switch between them flexibly. How brains support this ability is largely unknown, both neuroscientifically and algorithmically. One reasonable supposition…

Machine Learning · Computer Science 2017-06-23 Kevin T. Feigelis , Daniel L. K. Yamins

NeurAll: Towards a Unified Visual Perception Model for Automated Driving

Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Ganesh Sistu , Isabelle Leang , Sumanth Chennupati , Senthil Yogamani , Ciaran Hughes , Stefan Milz , Samir Rawashdeh

Robustness of Humans and Machines on Object Recognition with Extreme Image Transformations

Recent neural network architectures have claimed to explain data from the human visual cortex. Their demonstrated performance is however still limited by the dependence on exploiting low-level features for solving visual tasks. This…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Dakarai Crowder , Girik Malik

Learning to reason over visual objects

A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated by the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Shanka Subhra Mondal , Taylor Webb , Jonathan D. Cohen