Related papers: An Efficient GPU-based Implementation for Noise Ro…

Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization

Inspired by the behavior of humans talking in noisy environments, we propose an embodied embedded cognition approach to improve automatic speech recognition (ASR) systems for robots in challenging environments, such as with ego noise, using…

Sound · Computer Science 2019-02-15 Jorge , Davila-Chacon , Jindong , Liu , Stefan , Wermter

A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods

Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation,…

Robotics · Computer Science 2025-08-29 Reza Jalayer , Masoud Jalayer , Amirali Baniasadi

Single-Microphone-Based Sound Source Localization for Mobile Robots in Reverberant Environments

Accurately estimating sound source positions is crucial for robot audition. However, existing sound source localization methods typically rely on a microphone array with at least two spatially preconfigured microphones. This requirement…

Robotics · Computer Science 2025-06-23 Jiang Wang , Runwu Shi , Benjamin Yen , He Kong , Kazuhiro Nakadai

Auditory System for a Mobile Robot

In this thesis, we propose an artificial auditory system that gives a robot the ability to locate and track sounds, as well as to separate simultaneous sound sources and recognising simultaneous speech. We demonstrate that it is possible to…

Robotics · Computer Science 2016-02-23 Jean-Marc Valin

Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization

Sound source localization (SSL) is a critical technology for determining the position of sound sources in complex environments. However, existing methods face challenges such as high computational costs and precise calibration requirements,…

Sound · Computer Science 2025-05-28 Yiyuan Yang , Shitong Xu , Niki Trigoni , Andrew Markham

Gaussian Process Models for HRTF based Sound-Source Localization and Active-Learning

From a machine learning perspective, the human ability localize sounds can be modeled as a non-parametric and non-linear regression problem between binaural spectral features of sound received at the ears (input) and their sound-source…

Sound · Computer Science 2015-02-12 Yuancheng Luo , Dmitry N. Zotkin , Ramani Duraiswami

Adaptive high-precision sound source localization at low frequencies based on convolutional neural network

Sound source localization (SSL) technology plays a crucial role in various application areas such as fault diagnosis, speech separation, and vibration noise reduction. Although beamforming algorithms are widely used in SSL, their resolution…

Sound · Computer Science 2024-10-01 Wenbo Ma , Yan Lu , Yijun Liu

Audio Self-supervised Learning: A Survey

Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and…

Sound · Computer Science 2022-03-03 Shuo Liu , Adria Mallol-Ragolta , Emilia Parada-Cabeleiro , Kun Qian , Xin Jing , Alexander Kathan , Bin Hu , Bjoern W. Schuller

GPU-accelerated Guided Source Separation for Meeting Transcription

Guided source separation (GSS) is a type of target-speaker extraction method that relies on pre-computed speaker activities and blind source separation to perform front-end enhancement of overlapped speech signals. It was first proposed…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-15 Desh Raj , Daniel Povey , Sanjeev Khudanpur

Improving Sound Source Localization with Joint Slot Attention on Image and Audio

Sound source localization (SSL) is the task of locating the source of sound within an image. Due to the lack of localization labels, the de facto standard in SSL has been to represent an image and audio as a single embedding vector each,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Inho Kim , Youngkil Song , Jicheol Park , Won Hwa Kim , Suha Kwak

Where's That Voice Coming? Continual Learning for Sound Source Localization

Sound source localization (SSL) is essential for many speech-processing applications. Deep learning models have achieved high performance, but often fail when the training and inference environments differ. Adapting SSL models to dynamic…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-21 Yang Xiao , Rohan Kumar Das

Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter

We propose a system that gives a mobile robot the ability to separate simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation and a post-filter that gives us a…

Robotics · Computer Science 2016-03-09 Jean-Marc Valin , Jean Rouat , François Michaud

Boosting Self-Supervised Embeddings for Speech Enhancement

Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-06 Kuo-Hsuan Hung , Szu-wei Fu , Huan-Hsin Tseng , Hsin-Tien Chiang , Yu Tsao , Chii-Wann Lin

IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant…

Sound · Computer Science 2020-11-17 Yihui Fu , Zhuoyuan Yao , Weipeng He , Jian Wu , Xiong Wang , Zhanheng Yang , Shimin Zhang , Lei Xie , Dongyan Huang , Hui Bu , Petr Motlicek , Jean-Marc Odobez

Fast and Robust 3-D Sound Source Localization with DSVD-PHAT

This paper introduces a variant of the Singular Value Decomposition with Phase Transform (SVD-PHAT), named Difference SVD-PHAT (DSVD-PHAT), to achieve robust Sound Source Localization (SSL) in noisy conditions. Experiments are performed on…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-31 Francois Grondin , James Glass

A Pre-training Framework that Encodes Noise Information for Speech Quality Assessment

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

The Efficacy of Self-Supervised Speech Models for Audio Representations

Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on…

Sound · Computer Science 2023-02-01 Tung-Yu Wu , Chen-An Li , Tzu-Han Lin , Tsu-Yuan Hsu , Hung-Yi Lee

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR…

Sound · Computer Science 2022-05-05 Changfeng Gao , Gaofeng Cheng , Pengyuan Zhang

Towards noise-robust speech inversion through multi-task learning with speech enhancement

Recent studies demonstrate the effectiveness of Self Supervised Learning (SSL) speech representations for Speech Inversion (SI). However, applying SI in real-world scenarios remains challenging due to the pervasive presence of background…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-22 Saba Tabatabaee , Carol Espy-Wilson

Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach

Drones are becoming increasingly important in search and rescue missions, and even military operations. While the majority of drones are equipped with camera vision capabilities, the realm of drone audition remains underexplored due to the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-11 Yihsuan Wu , Yukai Chiu , Michael Anthony , Mingsian R. Bai