Related papers: Multi-Channel Automatic Speech Recognition Using D…

A Mixture of Expert Based Deep Neural Network for Improved ASR

This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-03 Vishwanath Pratap Singh , Shakti P. Rath , Abhishek Pandey

Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Minhua Wu , Kenichi Kumatani , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the performance of an automatic speech recognition (ASR) system is proposed in this paper. In order to optimize the DNN-based SE model in terms of the character…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-23 Ryosuke Sawata , Yosuke Kashiwagi , Shusuke Takahashi

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audio-visual automatic speech recognition (AV-ASR) system using deep recurrent neural network (RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal Classification…

Computer Vision and Pattern Recognition · Computer Science 2016-11-10 Abhinav Thanda , Shankar M Venkatesan

Multi-Channel and Multi-Microphone Acoustic Echo Cancellation Using A Deep Learning Based Approach

Building on the deep learning based acoustic echo cancellation (AEC) in the single-loudspeaker (single-channel) and single-microphone setup, this paper investigates multi-channel AEC (MCAEC) and multi-microphone AEC (MMAEC). We train a deep…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-04 Hao Zhang , DeLiang Wang

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in…

Computation and Language · Computer Science 2016-08-18 Jeehye Lee , Myungin Lee , Joon-Hyuk Chang

Robust Multi-channel Speech Recognition using Frequency Aligned Network

Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial…

Sound · Computer Science 2020-02-10 Taejin Park , Kenichi Kumatani , Minhua Wu , Shiva Sundaram

DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

In this paper, we propose a multi-channel network for simultaneous speech dereverberation, enhancement and separation (DESNet). To enable gradient propagation and joint optimization, we adopt the attentional selection mechanism of the…

Sound · Computer Science 2020-11-17 Yihui Fu , Jian Wu , Yanxin Hu , Mengtao Xing , Lei Xie

Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition

Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-13 Yih-Liang Shen , Chao-Yuan Huang , Syu-Siang Wang , Yu Tsao , Hsin-Min Wang , Tai-Shih Chi

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-18 Yicheng Du , Aditya Arie Nugraha , Kouhei Sekiguchi , Yoshiaki Bando , Mathieu Fontaine , Kazuyoshi Yoshii

Dual Application of Speech Enhancement for Automatic Speech Recognition

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and…

Sound · Computer Science 2020-11-10 Ashutosh Pandey , Chunxi Liu , Yun Wang , Yatharth Saraf

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources,…

Computation and Language · Computer Science 2015-10-20 Yajie Miao , Mohammad Gowayyed , Florian Metze

Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots

As for the humanoid robots, the internal noise, which is generated by motors, fans and mechanical components when the robot is moving or shaking its body, severely degrades the performance of the speech recognition accuracy. In this paper,…

Sound · Computer Science 2018-08-28 Moa Lee , Joon Hyuk Chang

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

We propose a new end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-24 Samuel Kriman , Stanislav Beliaev , Boris Ginsburg , Jocelyn Huang , Oleksii Kuchaiev , Vitaly Lavrukhin , Ryan Leary , Jason Li , Yang Zhang

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement

While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches. This need for resource-efficient machine learning is primarily driven by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-23 Lukas Pfeifenberger , Matthias Zöhrer , Günther Schindler , Wolfgang Roth , Holger Fröning , Franz Pernkopf

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Compensation for channel mismatch and noise interference is essential for robust automatic speech recognition. Enhanced speech has been introduced into the multi-condition training of acoustic models to improve their generalization ability.…

Sound · Computer Science 2022-11-24 Hung-Shin Lee , Pin-Yuan Chen , Yao-Fei Cheng , Yu Tsao , Hsin-Min Wang

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and…

Sound · Computer Science 2020-07-28 Hyeongju Kim , Hyeonseung Lee , Woo Hyun Kang , Hyung Yong Kim , Nam Soo Kim

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling…

Sound · Computer Science 2023-11-20 Qijie Shao , Pengcheng Guo , Jinghao Yan , Pengfei Hu , Lei Xie

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency. One of the key ingredients is the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-30 Zhao You , Dan Su , Jie Chen , Chao Weng , Dong Yu