Related papers: Pix2Code: Learning to Compose Neural Visual Concep…

Flexible Compositional Learning of Structured Visual Concepts

Humans are highly efficient learners, with the ability to grasp the meaning of a new concept from just a few examples. Unlike popular computer vision systems, humans can flexibly leverage the compositional structure of the visual world,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-21 Yanli Zhou , Brenden M. Lake

pix2rule: End-to-end Neuro-symbolic Rule Learning

Humans have the ability to seamlessly combine low-level visual input with high-level symbolic reasoning often in the form of recognising objects, learning relations between them and applying rules. Neuro-symbolic systems aim to bring a…

Machine Learning · Computer Science 2022-03-01 Nuri Cingillioglu , Alessandra Russo

Language-Informed Visual Concept Learning

Our understanding of the visual world is centered around various concept axes, characterizing different aspects of visual entities. While different concept axes can be easily specified by language, e.g. color, the exact visual nuances along…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Sharon Lee , Yunzhi Zhang , Shangzhe Wu , Jiajun Wu

A Study of Compositional Generalization in Neural Models

Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and…

Machine Learning · Computer Science 2020-07-09 Tim Klinger , Dhaval Adjodah , Vincent Marois , Josh Joseph , Matthew Riemer , Alex 'Sandy' Pentland , Murray Campbell

From Recognition to Cognition: Visual Commonsense Reasoning

Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy…

Computer Vision and Pattern Recognition · Computer Science 2019-03-27 Rowan Zellers , Yonatan Bisk , Ali Farhadi , Yejin Choi

Visual Superordinate Abstraction for Robust Concept Learning

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Qi Zheng , Chaoyue Wang , Dadong Wang , Dacheng Tao

CV4Code: Sourcecode Understanding via Visual Code Representations

We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a…

Software Engineering · Computer Science 2022-05-19 Ruibo Shi , Lili Tao , Rohan Saphal , Fran Silavong , Sean J. Moran

Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures

An image related question defines a specific visual task that is required in order to produce an appropriate answer. The answer may depend on a minor detail in the image and require complex reasoning and use of prior knowledge. When humans…

Computer Vision and Pattern Recognition · Computer Science 2018-10-26 Ben Zion Vatashsky , Shimon Ullman

Does Visual Pretraining Help End-to-End Reasoning?

We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Chen Sun , Calvin Luo , Xingyi Zhou , Anurag Arnab , Cordelia Schmid

Seeking the Building Blocks of Visual Imagery and Creativity in a Cognitively Inspired Neural Network

How do we imagine visual objects and combine them to create new forms? To answer this question, we need to explore the cognitive, computational and neural mechanisms underlying imagery and creativity. The body of research on deep learning…

Neurons and Cognition · Quantitative Biology 2021-12-14 Shekoofeh Hedayati , Roger Beaty , Brad Wyble

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality…

Computer Vision and Pattern Recognition · Computer Science 2024-02-26 Maitreya Patel , Tejas Gokhale , Chitta Baral , Yezhou Yang

Learning to Infer Generative Template Programs for Visual Concepts

People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 R. Kenny Jones , Siddhartha Chaudhuri , Daniel Ritchie

Image Retrieval from Contextual Descriptions

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we…

Computer Vision and Pattern Recognition · Computer Science 2022-11-21 Benno Krojer , Vaibhav Adlakha , Vibhav Vineet , Yash Goyal , Edoardo Ponti , Siva Reddy

Universal Multimodal Representation for Language Understanding

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

Visual Image Reconstruction from Brain Activity via Latent Representation

Visual image reconstruction, the decoding of perceptual content from brain activity into images, has advanced significantly with the integration of deep neural networks (DNNs) and generative models. This review traces the field's evolution…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Yukiyasu Kamitani , Misato Tanaka , Ken Shirakawa

Compositional Concept Generalization with Variational Quantum Circuits

Compositional generalization is a key facet of human cognition, but lacking in current AI tools such as vision-language models. Previous work examined whether a compositional tensor-based sentence semantics can overcome the challenge, but…

Artificial Intelligence · Computer Science 2025-09-12 Hala Hawashin , Mina Abbaszadeh , Nicholas Joseph , Beth Pearson , Martha Lewis , Mehrnoosh sadrzadeh

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation

We infer and generate three-dimensional (3D) scene information from a single input image and without supervision. This problem is under-explored, with most prior work relying on supervision from, e.g., 3D ground-truth, multiple images of a…

Computer Vision and Pattern Recognition · Computer Science 2020-04-20 Sai Rajeswar , Fahim Mannan , Florian Golemo , Jérôme Parent-Lévesque , David Vazquez , Derek Nowrouzezahrai , Aaron Courville

Pixel2Phys: Distilling Governing Laws from Visual Dynamics

Discovering physical laws directly from high-dimensional visual data is a long-standing human pursuit but remains a formidable challenge for machines, representing a fundamental goal of scientific intelligence. This task is inherently…

Computational Engineering, Finance, and Science · Computer Science 2026-02-24 Ruikun Li , Jun Yao , Yingfan Hua , Shixiang Tang , Biqing Qi , Bin Liu , Wanli Ouyang , Yan Lu

Learning Compositional Representations for Effective Low-Shot Generalization

We propose Recognition as Part Composition (RPC), an image encoding approach inspired by human cognition. It is based on the cognitive theory that humans recognize complex objects by components, and that they build a small compact…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Samarth Mishra , Pengkai Zhu , Venkatesh Saligrama

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra