Related papers: Automatic context window composition for distant s…

Contaminated speech training methods for robust DNN-HMM distant speech recognition

Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic…

Audio and Speech Processing · Electrical Eng. & Systems 2017-10-11 Mirco Ravanelli , Maurizio Omologo

Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in…

Computation and Language · Computer Science 2016-08-18 Jeehye Lee , Myungin Lee , Joon-Hyuk Chang

A network of deep neural networks for distant speech recognition

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and…

Computation and Language · Computer Science 2017-03-24 Mirco Ravanelli , Philemon Brakel , Maurizio Omologo , Yoshua Bengio

Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

We present a systematic analysis on the performance of a phonetic recogniser when the window of input features is not symmetric with respect to the current frame. The recogniser is based on Context Dependent Deep Neural Networks (CD-DNNs)…

Computation and Language · Computer Science 2016-06-30 Akash Kumar Dhaka , Giampiero Salvi

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-23 Shanshan Wang , Gaurav Naithani , Archontis Politis , Tuomas Virtanen

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone…

Computation and Language · Computer Science 2015-09-02 Andreas Schwarz , Christian Huemmer , Roland Maas , Walter Kellermann

Deep Learning for Distant Speech Recognition

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a…

Computation and Language · Computer Science 2017-12-19 Mirco Ravanelli

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding.…

Computation and Language · Computer Science 2017-09-07 Pranay Dighe , Gil Luyet , Afsaneh Asaei , Herve Bourlard

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone…

Signal Processing · Electrical Eng. & Systems 2020-11-04 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Dynamic Context Correspondence Network for Semantic Alignment

Establishing semantic correspondence is a core problem in computer vision and remains challenging due to large intra-class variations and lack of annotated data. In this paper, we aim to incorporate global semantic context in a flexible…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Shuaiyi Huang , Qiuyue Wang , Songyang Zhang , Shipeng Yan , Xuming He

A Deep Neural Network Sentence Level Classification Method with Context Information

In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ignored. Where methods do make use of…

Information Retrieval · Computer Science 2018-09-05 Xingyi Song , Johann Petrak , Angus Roberts

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localisation of Multiple Sources in Reverberant Environments

This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for robust binaural localisation of multiple sources in reverberant environments. DNNs are used to learn the relationship…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-08 Ning Ma , Tobias May , Guy J. Brown

Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

Frequency modulation features capture the fine structure of speech formants that constitute beneficial and supplementary to the traditional energy-based cepstral features. Improvements have been demonstrated mainly in GMM-HMM systems for…

Sound · Computer Science 2019-09-04 Isidoros Rodomagoulakis , Petros Maragos

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs)…

Computation and Language · Computer Science 2019-03-01 Daniel Ortega , Chia-Yu Li , Gisela Vallejo , Pavel Denisov , Ngoc Thang Vu

Scene-Agnostic Multi-Microphone Speech Dereverberation

Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-14 Yochai Yemini , Ethan Fetaya , Haggai Maron , Sharon Gannot

Ultra-Low Latency Speech Enhancement - A Comprehensive Study

Speech enhancement models should meet very low latency requirements typically smaller than 5 ms for hearing assistive devices. While various low-latency techniques have been proposed, comparing these methods in a controlled setup using DNNs…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-17 Haibin Wu , Sebastian Braun

Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement

We propose and analyze the use of an explicit time-context window for neural network-based spectral masking speech enhancement to leverage signal context dependencies between neighboring frames. In particular, we concentrate on soft masking…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-29 Luan Vinícius Fiorio , Boris Karanov , Bruno Defraene , Johan David , Wim van Houtum , Frans Widdershoven , Ronald M. Aarts

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs).…

Sound · Computer Science 2020-11-12 Cunhang Fan , Bin Liu , Jianhua Tao , Jiangyan Yi , Zhengqi Wen , Leichao Song

Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-17 Salar Jafarlou , Soheil Khorram , Vinay Kothapally , John H. L. Hansen

Scene-aware Far-field Automatic Speech Recognition

We propose a novel method for generating scene-aware training data for far-field automatic speech recognition. We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-23 Zhenyu Tang , Dinesh Manocha