Related papers: Data-Efficient Framework for Real-world Multiple S…

Ensemble of Discriminators for Domain Adaptation in Multiple Sound Source 2D Localization

This paper introduces an ensemble of discriminators that improves the accuracy of a domain adaptation technique for the localization of multiple sound sources. Recently, deep neural networks have led to promising results for this task, yet…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-17 Guillaume Le Moing , Don Joven Agravante , Tadanobu Inoue , Jayakorn Vongkulbhisal , Asim Munawar , Ryuki Tachibana , Phongtharin Vinayavekhin

Learning Multiple Sound Source 2D Localization

In this paper, we propose novel deep learning based algorithms for multiple sound source localization. Specifically, we aim to find the 2D Cartesian coordinates of multiple sound sources in an enclosed environment by using multiple…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-11 Guillaume Le Moing , Phongtharin Vinayavekhin , Tadanobu Inoue , Jayakorn Vongkulbhisal , Asim Munawar , Ryuki Tachibana , Don Joven Agravante

Microphone Array Signal Processing and Deep Learning for Speech Enhancement

Multi-channel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and non-target or noise sources for signal enhancement. However, the textbook solutions for optimal…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Reinhold Haeb-Umbach , Tomohiro Nakatani , Marc Delcroix , Christoph Boeddeker , Tsubasa Ochiai

Deep Neural Networks for Multiple Speaker Detection and Localization

We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization…

Sound · Computer Science 2018-09-18 Weipeng He , Petr Motlicek , Jean-Marc Odobez

Using Under-trained Deep Ensembles to Learn Under Extreme Label Noise

Improper or erroneous labelling can pose a hindrance to reliable generalization for supervised learning. This can have negative consequences, especially for critical fields such as healthcare. We propose an effective new approach for…

Machine Learning · Computer Science 2021-11-16 Konstantinos Nikolaidis , Thomas Plagemann , Stein Kristiansen , Vera Goebel , Mohan Kankanhalli

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels

Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-26 Yangdi Lu , Wenbo He

Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in…

Sound · Computer Science 2019-02-01 Juan Manuel Vera-Diaz , Daniel Pizarro , Javier Macias-Guarasa

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep…

Sound · Computer Science 2021-06-09 Christopher Schymura , Benedikt Bönninghoff , Tsubasa Ochiai , Marc Delcroix , Keisuke Kinoshita , Tomohiro Nakatani , Shoko Araki , Dorothea Kolossa

Adversarial Learning for Improved Onsets and Frames Music Transcription

Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly…

Sound · Computer Science 2019-06-21 Jong Wook Kim , Juan Pablo Bello

Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion

Data lies at the core of modern deep learning. The impressive performance of supervised learning is built upon a base of massive accurately labeled data. However, in some real-world applications, accurate labeling might not be viable;…

Machine Learning · Computer Science 2022-07-26 Zhengqi Gao , Fan-Keng Sun , Mingran Yang , Sucheng Ren , Zikai Xiong , Marc Engeler , Antonio Burazer , Linda Wildling , Luca Daniel , Duane S. Boning

Learning with Noisy labels via Self-supervised Adversarial Noisy Masking

Collecting large-scale datasets is crucial for training deep models, annotating the data, however, inevitably yields noisy labels, which poses challenges to deep learning algorithms. Previous efforts tend to mitigate this problem via…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Yuanpeng Tu , Boshen Zhang , Yuxi Li , Liang Liu , Jian Li , Jiangning Zhang , Yabiao Wang , Chengjie Wang , Cai Rong Zhao

Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise

Deep neural networks have proven to be highly effective when large amounts of data with clean labels are available. However, their performance degrades when training data contains noisy labels, leading to poor generalization on the test…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Fahimeh Fooladgar , Minh Nguyen Nhat To , Parvin Mousavi , Purang Abolmaesumi

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

Detecting sound source objects within visual observation is important for autonomous robots to comprehend surrounding environments. Since sounding objects have a large variety with different appearances in our living environments, labeling…

Sound · Computer Science 2020-07-29 Yoshiki Masuyama , Yoshiaki Bando , Kohei Yatabe , Yoko Sasaki , Masaki Onishi , Yasuhiro Oikawa

Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization

Sound source localization (SSL) is a critical technology for determining the position of sound sources in complex environments. However, existing methods face challenges such as high computational costs and precise calibration requirements,…

Sound · Computer Science 2025-05-28 Yiyuan Yang , Shitong Xu , Niki Trigoni , Andrew Markham

Towards Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

The deployment of machine listening algorithms in real-life applications is often impeded by a domain shift caused for instance by different microphone characteristics. In this paper, we propose a novel domain adaptation strategy based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-27 Jakob Abeßer , Meinard Müller

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-02 Shupei Liu , Linfeng Feng , Yijun Gong , Chengdong Liang , Chen Zhang , Xiao-Lei Zhang , Xuelong Li

BeamLearning: an end-to-end Deep Learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data

Sound sources localization using multichannel signal processing has been a subject of active research for decades. In recent years, the use of deep learning in audio signal processing has allowed to drastically improve performances for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-16 Hadrien Pujol , Éric Bavu , Alexandre Garcia

Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation

Deep learning techniques have been widely used in autonomous driving systems for the semantic understanding of urban scenes. However, they need a huge amount of labeled data for training, which is difficult and expensive to acquire. A…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Umberto Michieli , Matteo Biasetton , Gianluca Agresti , Pietro Zanuttigh

Effective Use of Synthetic Data for Urban Scene Semantic Segmentation

Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled…

Computer Vision and Pattern Recognition · Computer Science 2018-07-18 Fatemeh Sadat Saleh , Mohammad Sadegh Aliakbarian , Mathieu Salzmann , Lars Petersson , Jose M. Alvarez

Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching

The performance of machine learning algorithms is known to be negatively affected by possible mismatches between training (source) and test (target) data distributions. In fact, this problem emerges whenever an acoustic scene classification…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-04 Alessandro Ilic Mezza , Emanuël A. P. Habets , Meinard Müller , Augusto Sarti