Related papers: A Benchmark for Compositional Visual Reasoning

A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices

Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to…

Artificial Intelligence · Computer Science 2024-11-22 Sania Sinha , Tanawan Premsri , Parisa Kordjamshidi

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Compositional visual reasoning has emerged as a key research frontier in multimodal AI, aiming to endow machines with the human-like ability to decompose visual scenes, ground intermediate concepts, and perform multi-step logical inference.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Fucai Ke , Joy Hsu , Zhixi Cai , Zixian Ma , Xin Zheng , Xindi Wu , Sukai Huang , Weiqing Wang , Pari Delir Haghighi , Gholamreza Haffari , Ranjay Krishna , Jiajun Wu , Hamid Rezatofighi

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Ethan Perez , Harm de Vries , Florian Strub , Vincent Dumoulin , Aaron Courville

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-01-24 David Mascharka , Philip Tran , Ryan Soklaski , Arjun Majumdar

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and…

Computer Vision and Pattern Recognition · Computer Science 2019-03-08 Chi Zhang , Feng Gao , Baoxiong Jia , Yixin Zhu , Song-Chun Zhu

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-11-14 Irene Huang , Wei Lin , M. Jehanzeb Mirza , Jacob A. Hansen , Sivan Doveh , Victor Ion Butoi , Roei Herzig , Assaf Arbelle , Hilde Kuehne , Trevor Darrell , Chuang Gan , Aude Oliva , Rogerio Feris , Leonid Karlinsky

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

True intelligence hinges on the ability to uncover and leverage hidden causal relations. Despite significant progress in AI and computer vision (CV), there remains a lack of benchmarks for assessing models' abilities to infer latent…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Disheng Liu , Yiran Qiao , Wuche Liu , Yiren Lu , Yunlai Zhou , Tuo Liang , Yu Yin , Jing Ma

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help,…

Computer Vision and Pattern Recognition · Computer Science 2016-12-22 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

Understanding the computational demands underlying visual reasoning

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Mohit Vaishnav , Remi Cadene , Andrea Alamia , Drew Linsley , Rufin VanRullen , Thomas Serre

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Yining Hong , Li Yi , Joshua B. Tenenbaum , Antonio Torralba , Chuang Gan

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

Visual reasoning is critical for a wide range of computer vision tasks that go beyond surface-level object detection and classification. Despite notable advances in relational, symbolic, temporal, causal, and commonsense reasoning, existing…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Ayushman Sarkar , Mohd Yamani Idna Idris , Zhenyu Yu

Visual Abductive Reasoning

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Chen Liang , Wenguan Wang , Tianfei Zhou , Yi Yang

LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract Rules

Human cognition excels at symbolic reasoning, deducing abstract rules from limited samples. This has been explained using symbolic and connectionist approaches, inspiring the development of a neuro-symbolic architecture that combines both…

Artificial Intelligence · Computer Science 2024-05-24 Mohamed Mejri , Chandramouli Amarnath , Abhijit Chatterjee

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge…

Computer Vision and Pattern Recognition · Computer Science 2023-03-23 Yang Liu , Yushen Wei , Hong Yan , Guanbin Li , Liang Lin

Compositional diversity in visual concept learning

Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. In contrast, popular computer vision models struggle to make the same types of inferences,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Yanli Zhou , Reuben Feinman , Brenden M. Lake

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Aleksandar Stanić , Sergi Caelles , Michael Tschannen

Compositional Scene Representation Learning via Reconstruction: A Survey

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable…

Machine Learning · Computer Science 2023-06-16 Jinyang Yuan , Tonglin Chen , Bin Li , Xiangyang Xue

From Recognition to Cognition: Visual Commonsense Reasoning

Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy…

Computer Vision and Pattern Recognition · Computer Science 2019-03-27 Rowan Zellers , Yonatan Bisk , Ali Farhadi , Yejin Choi

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that: across 7 architectures…

Computation and Language · Computer Science 2023-05-17 Zixian Ma , Jerry Hong , Mustafa Omer Gul , Mona Gandhi , Irena Gao , Ranjay Krishna

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an…

Computation and Language · Computer Science 2018-09-07 Hao Tan , Mohit Bansal