English
Related papers

Related papers: A Benchmark for Compositional Visual Reasoning

200 papers

Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to…

Artificial Intelligence · Computer Science 2024-11-22 Sania Sinha , Tanawan Premsri , Parisa Kordjamshidi

Compositional visual reasoning has emerged as a key research frontier in multimodal AI, aiming to endow machines with the human-like ability to decompose visual scenes, ground intermediate concepts, and perform multi-step logical inference.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Fucai Ke , Joy Hsu , Zhixi Cai , Zixian Ma , Xin Zheng , Xindi Wu , Sukai Huang , Weiqing Wang , Pari Delir Haghighi , Gholamreza Haffari , Ranjay Krishna , Jiajun Wu , Hamid Rezatofighi

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Ethan Perez , Harm de Vries , Florian Strub , Vincent Dumoulin , Aaron Courville

Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-01-24 David Mascharka , Philip Tran , Ryan Soklaski , Arjun Majumdar

Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and…

Computer Vision and Pattern Recognition · Computer Science 2019-03-08 Chi Zhang , Feng Gao , Baoxiong Jia , Yixin Zhu , Song-Chun Zhu

Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated…

True intelligence hinges on the ability to uncover and leverage hidden causal relations. Despite significant progress in AI and computer vision (CV), there remains a lack of benchmarks for assessing models' abilities to infer latent…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Disheng Liu , Yiran Qiao , Wuche Liu , Yiren Lu , Yunlai Zhou , Tuo Liang , Yu Yin , Jing Ma

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help,…

Computer Vision and Pattern Recognition · Computer Science 2016-12-22 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Mohit Vaishnav , Remi Cadene , Andrea Alamia , Drew Linsley , Rufin VanRullen , Thomas Serre

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Yining Hong , Li Yi , Joshua B. Tenenbaum , Antonio Torralba , Chuang Gan

Visual reasoning is critical for a wide range of computer vision tasks that go beyond surface-level object detection and classification. Despite notable advances in relational, symbolic, temporal, causal, and commonsense reasoning, existing…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Ayushman Sarkar , Mohd Yamani Idna Idris , Zhenyu Yu

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Chen Liang , Wenguan Wang , Tianfei Zhou , Yi Yang

Human cognition excels at symbolic reasoning, deducing abstract rules from limited samples. This has been explained using symbolic and connectionist approaches, inspiring the development of a neuro-symbolic architecture that combines both…

Artificial Intelligence · Computer Science 2024-05-24 Mohamed Mejri , Chandramouli Amarnath , Abhijit Chatterjee

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge…

Computer Vision and Pattern Recognition · Computer Science 2023-03-23 Yang Liu , Yushen Wei , Hong Yan , Guanbin Li , Liang Lin

Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. In contrast, popular computer vision models struggle to make the same types of inferences,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Yanli Zhou , Reuben Feinman , Brenden M. Lake

Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Aleksandar Stanić , Sergi Caelles , Michael Tschannen

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable…

Machine Learning · Computer Science 2023-06-16 Jinyang Yuan , Tonglin Chen , Bin Li , Xiangyang Xue

Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy…

Computer Vision and Pattern Recognition · Computer Science 2019-03-27 Rowan Zellers , Yonatan Bisk , Ali Farhadi , Yejin Choi

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that: across 7 architectures…

Computation and Language · Computer Science 2023-05-17 Zixian Ma , Jerry Hong , Mustafa Omer Gul , Mona Gandhi , Irena Gao , Ranjay Krishna

Visual reasoning with compositional natural language instructions, e.g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an…

Computation and Language · Computer Science 2018-09-07 Hao Tan , Mohit Bansal
‹ Prev 1 2 3 10 Next ›