Related papers: Embodied vision for learning object representation…

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

Toddlers learn to recognize objects from different viewpoints with almost no supervision. During this learning, they execute frequent eye and head movements that shape their visual experience. It is presently unclear if and how these…

Computer Vision and Pattern Recognition · Computer Science 2025-06-26 Zhengyang Yu , Arthur Aubret , Marcel C. Raabe , Jane Yang , Chen Yu , Jochen Triesch

A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play

Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Deepayan Sanyal , Joel Michelson , Yuan Yang , James Ainooson , Maithilee Kunda

Simulated Cortical Magnification Supports Self-Supervised Object Learning

Recent self-supervised learning models simulate the development of semantic object representations by training on visual experience similar to that of toddlers. However, these models ignore the foveated nature of human vision with high/low…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Zhengyang Yu , Arthur Aubret , Chen Yu , Jochen Triesch

Learning 3D object-centric representation through prediction

As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 John Day , Tushar Arora , Jirui Liu , Li Erran Li , Ming Bo Cai

Learning high-level visual representations from a child's perspective without strong inductive biases

Young children develop sophisticated internal models of the world based on their visual experience. Can such models be learned from a child's visual experience without strong inductive biases? To investigate this, we train state-of-the-art…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 A. Emin Orhan , Brenden M. Lake

Active Object Manipulation Facilitates Visual Object Learning: An Egocentric Vision Study

Inspired by the remarkable ability of the infant visual learning system, a recent study collected first-person images from children to analyze the `training data' that they receive. We conduct a follow-up study that investigates two…

Computer Vision and Pattern Recognition · Computer Science 2019-06-05 Satoshi Tsutsui , Dian Zhi , Md Alimoor Reza , David Crandall , Chen Yu

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Jianwei Yang , Zhile Ren , Mingze Xu , Xinlei Chen , David Crandall , Devi Parikh , Dhruv Batra

Learning task-agnostic representation via toddler-inspired learning

One of the inherent limitations of current AI systems, stemming from the passive learning mechanisms (e.g., supervised learning), is that they perform well on labeled datasets but cannot deduce knowledge on their own. To tackle this…

Artificial Intelligence · Computer Science 2021-01-28 Kwanyoung Park , Junseok Park , Hyunseok Oh , Byoung-Tak Zhang , Youngki Lee

Temporal Slowness in Central Vision Drives Semantic Object Learning

Humans acquire semantic object representations from egocentric visual streams with minimal supervision, but the underlying mechanisms remain unclear. Importantly, the visual system only processes the center of its field of view with high…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Timothy Schaumlöffel , Arthur Aubret , Gemma Roig , Jochen Triesch

Using Motion and Internal Supervision in Object Recognition

In this thesis we address two related aspects of visual object recognition: the use of motion information, and the use of internal supervision, to help unsupervised learning. These two aspects are inter-related in the current study, since…

Computer Vision and Pattern Recognition · Computer Science 2018-12-14 Daniel Harari

Characterizing the visual representation of objects from the child's view

Children acquire object category representations from their everyday experiences in the first few years of life. What do the inputs to this learning process look like? We analyzed first-person videos of young children's visual experience at…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Jane Yang , Tarun Sepuri , Alvin Wei Ming Tan , Khai Loong Aw , Michael C. Frank , Bria Long

Multi-Object Representation Learning with Iterative Variational Inference

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even…

Machine Learning · Computer Science 2020-07-29 Klaus Greff , Raphaël Lopez Kaufman , Rishabh Kabra , Nick Watters , Chris Burgess , Daniel Zoran , Loic Matthey , Matthew Botvinick , Alexander Lerchner

Concepts Learned Visually by Infants Can Contribute to Visual Learning and Understanding in AI Models

Early in development, infants learn to extract surprisingly complex aspects of visual scenes. This early learning comes together with an initial understanding of the extracted concepts, such as their implications, causality, and using them…

Artificial Intelligence · Computer Science 2026-03-27 Shify Treger , Shimon Ullman

Learning warps object representations in the ventral temporal cortex

The human ventral temporal cortex (VTC) plays a critical role in object recognition. Although it is well established that visual experience shapes VTC object representations, the impact of semantic and contextual learning is unclear. In…

Neurons and Cognition · Quantitative Biology 2016-04-04 Alex Clarke , Philip J. Pell , Charan Ranganath , Lorraine K. Tyler

Embodied Learning for Lifelong Visual Perception

We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Assessing the alignment between infants' visual and linguistic experience using multimodal language models

Figuring out which objects or concepts words refer to is a central language learning challenge for young children. Most models of this process posit that children learn early object labels from co-occurrences of words and their referents…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Alvin Wei Ming Tan , Jane Yang , Tarun Sepuri , Khai Loong Aw , Robert Z. Sparks , Zi Yin , Virginia A. Marchman , Michael C. Frank , Bria Long

Time-Contrastive Networks: Self-Supervised Learning from Video

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings:…

Computer Vision and Pattern Recognition · Computer Science 2018-03-21 Pierre Sermanet , Corey Lynch , Yevgen Chebotar , Jasmine Hsu , Eric Jang , Stefan Schaal , Sergey Levine

Learning Features by Watching Objects Move

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Deepak Pathak , Ross Girshick , Piotr Dollár , Trevor Darrell , Bharath Hariharan

Object Space is Embodied

The perceived similarity between objects has often been attributed to their physical and conceptual features, such as appearance and animacy, and the theoretical framework of object space is accordingly conceived. Here, we extend this…

Neurons and Cognition · Quantitative Biology 2024-08-06 Shan Xu , Xinran Feng , Yuannan Li , Jia Liu

Self-supervised visual learning from interactions with objects

Self-supervised learning (SSL) has revolutionized visual representation learning, but has not achieved the robustness of human vision. A reason for this could be that SSL does not leverage all the data available to humans during learning.…

Computer Vision and Pattern Recognition · Computer Science 2024-08-09 Arthur Aubret , Céline Teulière , Jochen Triesch