English
Related papers

Related papers: Pix2Code: Learning to Compose Neural Visual Concep…

200 papers

Humans are highly efficient learners, with the ability to grasp the meaning of a new concept from just a few examples. Unlike popular computer vision systems, humans can flexibly leverage the compositional structure of the visual world,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-21 Yanli Zhou , Brenden M. Lake

Humans have the ability to seamlessly combine low-level visual input with high-level symbolic reasoning often in the form of recognising objects, learning relations between them and applying rules. Neuro-symbolic systems aim to bring a…

Machine Learning · Computer Science 2022-03-01 Nuri Cingillioglu , Alessandra Russo

Our understanding of the visual world is centered around various concept axes, characterizing different aspects of visual entities. While different concept axes can be easily specified by language, e.g. color, the exact visual nuances along…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Sharon Lee , Yunzhi Zhang , Shangzhe Wu , Jiajun Wu

Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and…

Machine Learning · Computer Science 2020-07-09 Tim Klinger , Dhaval Adjodah , Vincent Marois , Josh Joseph , Matthew Riemer , Alex 'Sandy' Pentland , Murray Campbell

Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy…

Computer Vision and Pattern Recognition · Computer Science 2019-03-27 Rowan Zellers , Yonatan Bisk , Ali Farhadi , Yejin Choi

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Qi Zheng , Chaoyue Wang , Dadong Wang , Dacheng Tao

We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a…

Software Engineering · Computer Science 2022-05-19 Ruibo Shi , Lili Tao , Rohan Saphal , Fran Silavong , Sean J. Moran

An image related question defines a specific visual task that is required in order to produce an appropriate answer. The answer may depend on a minor detail in the image and require complex reasoning and use of prior knowledge. When humans…

Computer Vision and Pattern Recognition · Computer Science 2018-10-26 Ben Zion Vatashsky , Shimon Ullman

We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Chen Sun , Calvin Luo , Xingyi Zhou , Anurag Arnab , Cordelia Schmid

How do we imagine visual objects and combine them to create new forms? To answer this question, we need to explore the cognitive, computational and neural mechanisms underlying imagery and creativity. The body of research on deep learning…

Neurons and Cognition · Quantitative Biology 2021-12-14 Shekoofeh Hedayati , Roger Beaty , Brad Wyble

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality…

Computer Vision and Pattern Recognition · Computer Science 2024-02-26 Maitreya Patel , Tejas Gokhale , Chitta Baral , Yezhou Yang

People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 R. Kenny Jones , Siddhartha Chaudhuri , Daniel Ritchie

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we…

Computer Vision and Pattern Recognition · Computer Science 2022-11-21 Benno Krojer , Vaibhav Adlakha , Vibhav Vineet , Yash Goyal , Edoardo Ponti , Siva Reddy

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

Visual image reconstruction, the decoding of perceptual content from brain activity into images, has advanced significantly with the integration of deep neural networks (DNNs) and generative models. This review traces the field's evolution…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Yukiyasu Kamitani , Misato Tanaka , Ken Shirakawa

Compositional generalization is a key facet of human cognition, but lacking in current AI tools such as vision-language models. Previous work examined whether a compositional tensor-based sentence semantics can overcome the challenge, but…

Artificial Intelligence · Computer Science 2025-09-12 Hala Hawashin , Mina Abbaszadeh , Nicholas Joseph , Beth Pearson , Martha Lewis , Mehrnoosh sadrzadeh

We infer and generate three-dimensional (3D) scene information from a single input image and without supervision. This problem is under-explored, with most prior work relying on supervision from, e.g., 3D ground-truth, multiple images of a…

Computer Vision and Pattern Recognition · Computer Science 2020-04-20 Sai Rajeswar , Fahim Mannan , Florian Golemo , Jérôme Parent-Lévesque , David Vazquez , Derek Nowrouzezahrai , Aaron Courville

Discovering physical laws directly from high-dimensional visual data is a long-standing human pursuit but remains a formidable challenge for machines, representing a fundamental goal of scientific intelligence. This task is inherently…

Computational Engineering, Finance, and Science · Computer Science 2026-02-24 Ruikun Li , Jun Yao , Yingfan Hua , Shixiang Tang , Biqing Qi , Bin Liu , Wanli Ouyang , Yan Lu

We propose Recognition as Part Composition (RPC), an image encoding approach inspired by human cognition. It is based on the cognitive theory that humans recognize complex objects by components, and that they build a small compact…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Samarth Mishra , Pengkai Zhu , Venkatesh Saligrama

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra
‹ Prev 1 2 3 10 Next ›