English
Related papers

Related papers: Multichannel-based learning for audio object extra…

200 papers

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video…

Computer Vision and Pattern Recognition · Computer Science 2019-08-22 Ruohan Gao , Kristen Grauman

Apparatus and methods are disclosed for performing object-based audio rendering on a plurality of audio objects which define a sound scene, each audio object comprising at least one audio signal and associated metadata. The apparatus…

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Ruohan Gao , Rogerio Feris , Kristen Grauman

Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end, we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Triantafyllos Afouras , Andrew Owens , Joon Son Chung , Andrew Zisserman

The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video. We make four contributions: First, we introduce an object-centric segmentation model with a depth-ordered layer…

Computer Vision and Pattern Recognition · Computer Science 2022-11-15 Junyu Xie , Weidi Xie , Andrew Zisserman

Generating accurate sounds for complex audio-visual scenes is challenging, especially in the presence of multiple objects and sound sources. In this paper, we propose an {\em interactive object-aware audio generation} model that grounds…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Tingle Li , Baihe Huang , Xiaobin Zhuang , Dongya Jia , Jiawei Chen , Yuping Wang , Zhuo Chen , Gopala Anumanchipalli , Yuxuan Wang

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate…

Computer Vision and Pattern Recognition · Computer Science 2018-11-12 Sanjeel Parekh , Alexey Ozerov , Slim Essid , Ngoc Duong , Patrick Pérez , Gaël Richard

We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-16 Dongheon Lee , Younghoo Kwon , Jung-Woo Choi

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs of models designed for…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-11 Wim Boes , Hugo Van hamme

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen

Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task. Current state-of-the-art methods predict audio masks from artificially mixed spectrograms, known as Mix-and-Separate…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Tanzila Rahman , Leonid Sigal

We propose a framework to continuously learn object-centric representations for visual learning and understanding. Existing object-centric representations either rely on supervisions that individualize objects in the scene, or perform…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Chuanyu Pan , Yanchao Yang , Kaichun Mo , Yueqi Duan , Leonidas Guibas

Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-26 Kaihui Zheng , Yuqing Ren , Zixin Shen , Tianxu Qin

Advances in object tracking and acoustic beamforming are driving new capabilities in surveillance, human-computer interaction, and robotics. This work presents an embedded system that integrates deep learning-based tracking with beamforming…

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even…

Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Tinghuai Wang

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object…

Computer Vision and Pattern Recognition · Computer Science 2019-04-22 Andrew Rouditchenko , Hang Zhao , Chuang Gan , Josh McDermott , Antonio Torralba

We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to…

Computer Vision and Pattern Recognition · Computer Science 2022-09-14 Fabrizio Pedersoli , Dryden Wiebe , Amin Banitalebi , Yong Zhang , George Tzanetakis , Kwang Moo Yi

We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Triantafyllos Afouras , Yuki M. Asano , Francois Fagan , Andrea Vedaldi , Florian Metze
‹ Prev 1 2 3 10 Next ›