English
Related papers

Related papers: Multi-Channel Automatic Speech Recognition Using D…

200 papers

This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-03 Vishwanath Pratap Singh , Shakti P. Rath , Abhishek Pandey

Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Minhua Wu , Kenichi Kumatani , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the performance of an automatic speech recognition (ASR) system is proposed in this paper. In order to optimize the DNN-based SE model in terms of the character…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-23 Ryosuke Sawata , Yosuke Kashiwagi , Shusuke Takahashi

In this work, we propose a training algorithm for an audio-visual automatic speech recognition (AV-ASR) system using deep recurrent neural network (RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal Classification…

Computer Vision and Pattern Recognition · Computer Science 2016-11-10 Abhinav Thanda , Shankar M Venkatesan

Building on the deep learning based acoustic echo cancellation (AEC) in the single-loudspeaker (single-channel) and single-microphone setup, this paper investigates multi-channel AEC (MCAEC) and multi-microphone AEC (MMAEC). We train a deep…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-04 Hao Zhang , DeLiang Wang

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in…

Computation and Language · Computer Science 2016-08-18 Jeehye Lee , Myungin Lee , Joon-Hyuk Chang

Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial…

Sound · Computer Science 2020-02-10 Taejin Park , Kenichi Kumatani , Minhua Wu , Shiva Sundaram

In this paper, we propose a multi-channel network for simultaneous speech dereverberation, enhancement and separation (DESNet). To enable gradient propagation and joint optimization, we adopt the attentional selection mechanism of the…

Sound · Computer Science 2020-11-17 Yihui Fu , Jian Wu , Yanxin Hu , Mengtao Xing , Lei Xie

Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-13 Yih-Liang Shen , Chao-Yuan Huang , Syu-Siang Wang , Yu Tsao , Hsin-Min Wang , Tai-Shih Chi

This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-18 Yicheng Du , Aditya Arie Nugraha , Kouhei Sekiguchi , Yoshiaki Bando , Mathieu Fontaine , Kazuyoshi Yoshii

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and…

Sound · Computer Science 2020-11-10 Ashutosh Pandey , Chunxi Liu , Yun Wang , Yatharth Saraf

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources,…

Computation and Language · Computer Science 2015-10-20 Yajie Miao , Mohammad Gowayyed , Florian Metze

As for the humanoid robots, the internal noise, which is generated by motors, fans and mechanical components when the robot is moving or shaking its body, severely degrades the performance of the speech recognition accuracy. In this paper,…

Sound · Computer Science 2018-08-28 Moa Lee , Joon Hyuk Chang

We propose a new end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-24 Samuel Kriman , Stanislav Beliaev , Boris Ginsburg , Jocelyn Huang , Oleksii Kuchaiev , Vitaly Lavrukhin , Ryan Leary , Jason Li , Yang Zhang

While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches. This need for resource-efficient machine learning is primarily driven by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-23 Lukas Pfeifenberger , Matthias Zöhrer , Günther Schindler , Wolfgang Roth , Holger Fröning , Franz Pernkopf

Compensation for channel mismatch and noise interference is essential for robust automatic speech recognition. Enhanced speech has been introduced into the multi-condition training of acoustic models to improve their generalization ability.…

Sound · Computer Science 2022-11-24 Hung-Shin Lee , Pin-Yuan Chen , Yao-Fei Cheng , Yu Tsao , Hsin-Min Wang

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and…

Sound · Computer Science 2020-07-28 Hyeongju Kim , Hyeonseung Lee , Woo Hyun Kang , Hyung Yong Kim , Nam Soo Kim

Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling…

Sound · Computer Science 2023-11-20 Qijie Shao , Pengcheng Guo , Jinghao Yan , Pengfei Hu , Lei Xie

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency. One of the key ingredients is the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-30 Zhao You , Dan Su , Jie Chen , Chao Weng , Dong Yu
‹ Prev 1 2 3 10 Next ›