Related papers: Robust Multi-channel Speech Recognition using Freq…

Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Minhua Wu , Kenichi Kumatani , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

To achieve robust far-field automatic speech recognition (ASR), existing techniques typically employ an acoustic front end (AFE) cascaded with a neural transducer (NT) ASR model. The AFE output, however, could be unreliable, as the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-02 Feng-Ju Chang , Anastasios Alexandridis , Rupak Vignesh Swaminathan , Martin Radfar , Harish Mallidi , Maurizio Omologo , Athanasios Mouchtaris , Brian King , Roland Maas

Far-Field Automatic Speech Recognition

The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic speech recognition (ASR), has received a significant increase of attention in science and industry, which caused or was caused by an…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-22 Reinhold Haeb-Umbach , Jahn Heymann , Lukas Drude , Shinji Watanabe , Marc Delcroix , Tomohiro Nakatani

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-10 Ante Jukić , Jagadeesh Balam , Boris Ginsburg

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio,…

Sound · Computer Science 2020-05-05 Sanna Wager , Aparna Khare , Minhua Wu , Kenichi Kumatani , Shiva Sundaram

Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription

Joint optimization of multi-channel front-end and automatic speech recognition (ASR) has attracted much interest. While promising results have been reported for various tasks, past studies on its meeting transcription application were…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-30 Xiaofei Wang , Naoyuki Kanda , Yashesh Gaur , Zhuo Chen , Zhong Meng , Takuya Yoshioka

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-08 Dongdi Zhao , Jianbo Ma , Lu Lu , Jinke Li , Xuan Ji , Lei Zhu , Fuming Fang , Ming Liu , Feijun Jiang

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works…

Sound · Computer Science 2019-04-02 Kazuki Shimada , Yoshiaki Bando , Masato Mimura , Katsutoshi Itoyama , Kazuyoshi Yoshii , Tatsuya Kawahara

Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR

Automatic speech recognition (ASR) in multichannel, multi-speaker scenarios remains challenging due to ambient noise, reverberation and overlapping speakers. In this paper, we propose a beamforming approach that processes specific angular…

Sound · Computer Science 2025-09-15 Can Cui , Paul Magron , Mostafa Sadeghi , Emmanuel Vincent

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-18 Yicheng Du , Aditya Arie Nugraha , Kouhei Sekiguchi , Yoshiaki Bando , Mathieu Fontaine , Kazuyoshi Yoshii

FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing

Beamforming has been extensively investigated for multi-channel audio processing tasks. Recently, learning-based beamforming methods, sometimes called \textit{neural beamformers}, have achieved significant improvements in both signal…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-02 Yi Luo , Enea Ceolini , Cong Han , Shih-Chii Liu , Nima Mesgarani

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR…

Sound · Computer Science 2023-12-18 Bingshen Mu , Pengcheng Guo , Dake Guo , Pan Zhou , Wei Chen , Lei Xie

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech…

Sound · Computer Science 2023-07-25 Yoshiki Masuyama , Xuankai Chang , Wangyou Zhang , Samuele Cornell , Zhong-Qiu Wang , Nobutaka Ono , Yanmin Qian , Shinji Watanabe

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

It has been shown that the intelligibility of noisy speech can be improved by speech enhancement algorithms. However, speech enhancement has not been established as an effective frontend for robust automatic speech recognition (ASR) in…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-22 Yufeng Yang , Ashutosh Pandey , DeLiang Wang

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The…

Computation and Language · Computer Science 2015-03-31 Matthew Ager , Zoran Cvetkovic , Peter Sollich

Stream Attention for far-field multi-microphone ASR

A stream attention framework has been applied to the posterior probabilities of the deep neural network (DNN) to improve the far-field automatic speech recognition (ASR) performance in the multi-microphone configuration. The stream…

Sound · Computer Science 2017-12-01 Xiaofei Wang , Yonghong Yan , Hynek Hermansky

Multi-Channel Automatic Speech Recognition Using Deep Complex Unet

The front-end module in multi-channel automatic speech recognition (ASR) systems mainly use microphone array techniques to produce enhanced signals in noisy conditions with reverberation and echos. Recently, neural network (NN) based…

Sound · Computer Science 2020-11-19 Yuxiang Kong , Jian Wu , Quandong Wang , Peng Gao , Weiji Zhuang , Yujun Wang , Lei Xie

Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR

In multi-channel speech enhancement and robust automatic speech recognition (ASR), beamforming can typically improve the signal-to-noise ratio (SNR) of the target speaker and produce reliable enhancement with little distortion to target…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-22 Zhong-Qiu Wang , Ruizhe Pang

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-28 Anurenjan Purushothaman , Anirudh Sreeram , Sriram Ganapathy