English
Related papers

Related papers: A Deep-Bayesian Framework for Adaptive Speech Dura…

200 papers

In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models. Latent variables indeed encode both transferable distributional information and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-26 Hu Hu , Sabato Marco Siniscalchi , Chin-Hui Lee

While speaking at different rates, articulators (like tongue, lips) tend to move differently and the enunciations are also of different durations. In the past, affine transformation and DNN have been used to transform articulatory movements…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-21 Abhayjeet Singh , Aravind Illa , Prasanta Kumar Ghosh

In this work, we propose a novel variational Bayesian adaptive learning approach for cross-domain knowledge transfer to address acoustic mismatches between training and testing conditions, such as recording devices and environmental noise.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-28 Hu Hu , Sabato Marco Siniscalchi , Chao-Han Huck Yang , Chin-Hui Lee

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based…

Sound · Computer Science 2024-06-17 Xurong Xie , Xunying Liu , Tan Lee , Lan Wang

For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge. The causality and memory usage limit that only the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-22 Chengyu Zheng , Yuan Zhou , Xiulian Peng , Yuan Zhang , Yan Lu

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the…

Computation and Language · Computer Science 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined…

Computation and Language · Computer Science 2025-06-16 Hanzhi Zhang , Heng Fan , Kewei Sha , Yan Huang , Yunhe Feng

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed…

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word,…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Lun Huang , Wenmin Wang , Yaxian Xia , Jie Chen

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-27 Jiajun Deng , Guinan Li , Xurong Xie , Zengrui Jin , Mingyu Cui , Tianzi Wang , Shujie Hu , Mengzhe Geng , Xunying Liu

In this paper, we present Adaptive Computation Steps (ACS) algo-rithm, which enables end-to-end speech recognition models to dy-namically decide how many frames should be processed to predict a linguistic output. The model that applies ACS…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-27 Mohan Li , Min Liu , Masanori Hattori

Speech-to-text alignment is a critical component of neural text to speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line, while non-autoregressive end to end TTS models rely on…

Sound · Computer Science 2025-09-01 Junjie Cao

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS). We propose a new TTS framework using explicit duration modeling that incorporates duration as a discrete latent variable to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-21 Yusuke Yasuda , Xin Wang , Junichi Yamagishi

Adaptive algorithm based on multi-channel linear prediction is an effective dereverberation method balancing well between the attenuation of the long-term reverberation and the dereverberated speech quality. However, the abrupt change of…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-24 Teng Xiang , Jing Lu , Kai Chen

Deep learning-based speech enhancement models achieve remarkable performance when test distributions match training conditions, but often degrade when deployed in unpredictable real-world environments with domain shifts. To address this…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-09 Tobias Raichle , Niels Edinger , Bin Yang

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long…

Sound · Computer Science 2019-05-03 Yuanyuan Zhang , Jun Du , Zirui Wang , Jianshu Zhang

The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of…

Sound · Computer Science 2018-11-07 Santiago Pascual , Antonio Bonafonte , Joan Serrà

Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness. This paper presents a binaural speech enhancement…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-11 Vikas Tokala , Eric Grinstein , Mike Brookes , Simon Doclo , Jesper Jensen , Patrick A. Naylor

Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using…

Computation and Language · Computer Science 2017-06-05 Michael Neumann , Ngoc Thang Vu

Auditory attention decoding (AAD) is the process of identifying the attended speech in a multi-talker environment using brain signals, typically recorded through electroencephalography (EEG). Over the past decade, AAD has undergone…

Sound · Computer Science 2025-07-08 Nhan Duc Thanh Nguyen , Huy Phan , Simon Geirnaert , Kaare Mikkelsen , Preben Kidmose
‹ Prev 1 2 3 10 Next ›