Related papers: VisualEchoes: Spatial Image Representation Learnin…

Build a training interface to install the bat's echolocation skills in humans

Bats use a sophisticated ultrasonic sensing method called echolocation to recognize the environment. Recently, it has been reported that sighted human participants with no prior experience in echolocation can improve their ability to…

Human-Computer Interaction · Computer Science 2023-02-20 Miyoko Tsumaki , Yu Teshima , Takao Tsuchiya , Kaoru Ashihara , Kohta I. Kobayasi , Shizuko Hiryu

BatVision: Learning to See 3D Spatial Layout with Two Ears

Many species have evolved advanced non-visual perception while artificial systems fall behind. Radar and ultrasound complement camera-based vision but they are often too costly and complex to set up for very limited information gain. In…

Computer Vision and Pattern Recognition · Computer Science 2020-03-20 Jesper Haahr Christensen , Sascha Hornauer , Stella Yu

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. Our method uses a masked auto-encoding framework to synthesize masked binaural (multi-channel) audio…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Sagnik Majumder , Ziad Al-Halah , Kristen Grauman

3D imaging from multipath temporal echoes

Echo-location is a broad approach to imaging and sensing that includes both man-made RADAR, LIDAR, SONAR and also animal navigation. However, full 3D information based on echo-location requires some form of scanning of the scene in order to…

Image and Video Processing · Electrical Eng. & Systems 2021-06-16 Alex Turpin , Valentin Kapitany , Jack Radford , Davide Rovelli , Kevin Mitchell , Ashley Lyons , Ilya Starshynov , Daniele Faccio

Learning spatial hearing via innate mechanisms

The acoustic cues used by humans and other animals to localise sounds are subtle, and change during and after development. This means that we need to constantly relearn or recalibrate the auditory spatial map throughout our lifetimes. This…

Neural and Evolutionary Computing · Computer Science 2025-04-18 Yang Chu , Wayne Luk , Dan Goodman

Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

Robots coexisting with humans in their environment and performing services for them need the ability to interact with them. One particular requirement for such robots is that they are able to understand spatial relations and can place…

Robotics · Computer Science 2020-02-24 Oier Mees , Alp Emek , Johan Vertens , Wolfram Burgard

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

This paper focuses on perceiving and navigating 3D environments using echoes and RGB image. In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations. Unlike previous works, we go beyond…

Computer Vision and Pattern Recognition · Computer Science 2024-02-12 Lingyu Zhu , Esa Rahtu , Hang Zhao

Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Yicong Hong , Yang Zhou , Ruiyi Zhang , Franck Dernoncourt , Trung Bui , Stephen Gould , Hao Tan

Active Perception and Representation for Robotic Manipulation

The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment. In contrast, recent applications of reinforcement learning in robotic manipulation…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Youssef Zaky , Gaurav Paruthi , Bryan Tripp , James Bergstra

SonoTraceLab -- A Raytracing-Based Acoustic Modelling System for Simulating Echolocation Behavior of Bats

Echolocation is the prime sensing modality for many species of bats, who show the intricate ability to perform a plethora of tasks in complex and unstructured environments. Understanding this exceptional feat of sensorimotor interaction is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-23 Wouter Jansen , Jan Steckel

The Curious Robot: Learning Visual Representations via Physical Interactions

What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation…

Computer Vision and Pattern Recognition · Computer Science 2016-07-27 Lerrel Pinto , Dhiraj Gandhi , Yuanfeng Han , Yong-Lae Park , Abhinav Gupta

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Ainaz Eftekhar , Kuo-Hao Zeng , Jiafei Duan , Ali Farhadi , Ani Kembhavi , Ranjay Krishna

Environment Predictive Coding for Embodied Agents

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of…

Computer Vision and Pattern Recognition · Computer Science 2021-02-05 Santhosh K. Ramakrishnan , Tushar Nagarajan , Ziad Al-Halah , Kristen Grauman

Unsupervised Visual Representation Learning by Context Prediction

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image…

Computer Vision and Pattern Recognition · Computer Science 2016-01-19 Carl Doersch , Abhinav Gupta , Alexei A. Efros

Imitation Learning-based Visual Servoing for Tracking Moving Objects

In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of…

Robotics · Computer Science 2023-09-15 Rocco Felici , Matteo Saveriano , Loris Roveda , Antonio Paolillo

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual…

Computer Vision and Pattern Recognition · Computer Science 2017-12-21 Andrew Owens , Jiajun Wu , Josh H. McDermott , William T. Freeman , Antonio Torralba

Representation Learning in Partially Observable Environments using Sensorimotor Prediction

In order to explore and act autonomously in an environment, an agent needs to learn from the sensorimotor information that is captured while acting. By extracting the regularities in this sensorimotor stream, it can learn a model of the…

Artificial Intelligence · Computer Science 2018-04-27 Thibaut Kulak , Michael Garcia Ortiz

Beyond Image to Depth: Improving Depth Prediction using Echoes

We address the problem of estimating depth with multi modal audio visual data. Inspired by the ability of animals, such as bats and dolphins, to infer distance of objects with echolocation, some recent methods have utilized echoes for depth…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Kranti Kumar Parida , Siddharth Srivastava , Gaurav Sharma

Representation Learning for Grounded Spatial Reasoning

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive…

Computation and Language · Computer Science 2017-11-15 Michael Janner , Karthik Narasimhan , Regina Barzilay

A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play

Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Deepayan Sanyal , Joel Michelson , Yuan Yang , James Ainooson , Maithilee Kunda