English
Related papers

Related papers: VisualEchoes: Spatial Image Representation Learnin…

200 papers

Bats use a sophisticated ultrasonic sensing method called echolocation to recognize the environment. Recently, it has been reported that sighted human participants with no prior experience in echolocation can improve their ability to…

Human-Computer Interaction · Computer Science 2023-02-20 Miyoko Tsumaki , Yu Teshima , Takao Tsuchiya , Kaoru Ashihara , Kohta I. Kobayasi , Shizuko Hiryu

Many species have evolved advanced non-visual perception while artificial systems fall behind. Radar and ultrasound complement camera-based vision but they are often too costly and complex to set up for very limited information gain. In…

Computer Vision and Pattern Recognition · Computer Science 2020-03-20 Jesper Haahr Christensen , Sascha Hornauer , Stella Yu

We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. Our method uses a masked auto-encoding framework to synthesize masked binaural (multi-channel) audio…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Sagnik Majumder , Ziad Al-Halah , Kristen Grauman

Echo-location is a broad approach to imaging and sensing that includes both man-made RADAR, LIDAR, SONAR and also animal navigation. However, full 3D information based on echo-location requires some form of scanning of the scene in order to…

Image and Video Processing · Electrical Eng. & Systems 2021-06-16 Alex Turpin , Valentin Kapitany , Jack Radford , Davide Rovelli , Kevin Mitchell , Ashley Lyons , Ilya Starshynov , Daniele Faccio

The acoustic cues used by humans and other animals to localise sounds are subtle, and change during and after development. This means that we need to constantly relearn or recalibrate the auditory spatial map throughout our lifetimes. This…

Neural and Evolutionary Computing · Computer Science 2025-04-18 Yang Chu , Wayne Luk , Dan Goodman

Robots coexisting with humans in their environment and performing services for them need the ability to interact with them. One particular requirement for such robots is that they are able to understand spatial relations and can place…

Robotics · Computer Science 2020-02-24 Oier Mees , Alp Emek , Johan Vertens , Wolfram Burgard

This paper focuses on perceiving and navigating 3D environments using echoes and RGB image. In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations. Unlike previous works, we go beyond…

Computer Vision and Pattern Recognition · Computer Science 2024-02-12 Lingyu Zhu , Esa Rahtu , Hang Zhao

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Yicong Hong , Yang Zhou , Ruiyi Zhang , Franck Dernoncourt , Trung Bui , Stephen Gould , Hao Tan

The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment. In contrast, recent applications of reinforcement learning in robotic manipulation…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Youssef Zaky , Gaurav Paruthi , Bryan Tripp , James Bergstra

Echolocation is the prime sensing modality for many species of bats, who show the intricate ability to perform a plethora of tasks in complex and unstructured environments. Understanding this exceptional feat of sensorimotor interaction is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-23 Wouter Jansen , Jan Steckel

What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation…

Computer Vision and Pattern Recognition · Computer Science 2016-07-27 Lerrel Pinto , Dhiraj Gandhi , Yuanfeng Han , Yong-Lae Park , Abhinav Gupta

Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Ainaz Eftekhar , Kuo-Hao Zeng , Jiafei Duan , Ali Farhadi , Ani Kembhavi , Ranjay Krishna

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of…

Computer Vision and Pattern Recognition · Computer Science 2021-02-05 Santhosh K. Ramakrishnan , Tushar Nagarajan , Ziad Al-Halah , Kristen Grauman

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image…

Computer Vision and Pattern Recognition · Computer Science 2016-01-19 Carl Doersch , Abhinav Gupta , Alexei A. Efros

In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of…

Robotics · Computer Science 2023-09-15 Rocco Felici , Matteo Saveriano , Loris Roveda , Antonio Paolillo

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual…

Computer Vision and Pattern Recognition · Computer Science 2017-12-21 Andrew Owens , Jiajun Wu , Josh H. McDermott , William T. Freeman , Antonio Torralba

In order to explore and act autonomously in an environment, an agent needs to learn from the sensorimotor information that is captured while acting. By extracting the regularities in this sensorimotor stream, it can learn a model of the…

Artificial Intelligence · Computer Science 2018-04-27 Thibaut Kulak , Michael Garcia Ortiz

We address the problem of estimating depth with multi modal audio visual data. Inspired by the ability of animals, such as bats and dolphins, to infer distance of objects with echolocation, some recent methods have utilized echoes for depth…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Kranti Kumar Parida , Siddharth Srivastava , Gaurav Sharma

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive…

Computation and Language · Computer Science 2017-11-15 Michael Janner , Karthik Narasimhan , Regina Barzilay

Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Deepayan Sanyal , Joel Michelson , Yuan Yang , James Ainooson , Maithilee Kunda
‹ Prev 1 2 3 10 Next ›