English
Related papers

Related papers: Speech-Declipping Transformer with Complex Spectro…

200 papers

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs are limited and fixed, which is not desirable for tasks…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-03 Keyu An , Yi Zhang , Zhijian Ou

Clipping is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-09 Jayeon Yi , Junghyun Koo , Kyogu Lee

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One…

Sound · Computer Science 2023-03-13 William Ravenscroft , Stefan Goetze , Thomas Hain

In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-17 Shubo Lv , Yihui Fu , Mengtao Xing , Jiayao Sun , Lei Xie , Jun Huang , Yannan Wang , Tao Yu

Speaker verification (SV) aims to determine whether the speaker's identity of a test utterance is the same as the reference speech. In the past few years, extracting speaker embeddings using deep neural networks for SV systems has gone…

Sound · Computer Science 2022-05-27 Nan Zhang , Jianzong Wang , Zhenhou Hong , Chendong Zhao , Xiaoyang Qu , Jing Xiao

We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-17 Daiki Takeuchi , Kohei Yatabe , Yuma Koizumi , Yasuhiro Oikawa , Noboru Harada

Speech denoising (SD) is an important task of many, if not all, modern signal processing chains used in devices and for everyday-life applications. While there are many published and powerful deep neural network (DNN)-based methods for SD,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-08 Konstantinos Drossos , Mikko Heikkinen , Paschalis Tsiaflakis

We present a novel model designed for resource-efficient multichannel speech enhancement in the time domain, with a focus on low latency, lightweight, and low computational requirements. The proposed model incorporates explicit spatial and…

Sound · Computer Science 2024-01-17 Ashutosh Pandey , Buye Xu

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs).…

Sound · Computer Science 2020-11-12 Cunhang Fan , Bin Liu , Jianhua Tao , Jiangyan Yi , Zhengqi Wen , Leichao Song

In the process of recording, storage and transmission of time-domain audio signals, errors may be introduced that are difficult to correct in an unsupervised way. Here, we train a convolutional deep neural network to re-synthesize input…

Sound · Computer Science 2015-03-20 Andrew J. R. Simpson

Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-16 Jin Sob Kim , Hyun Joon Park , Wooseok Shin , Juan Yun , Sung Won Han

Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech…

Sound · Computer Science 2023-10-11 William Ravenscroft , Stefan Goetze , Thomas Hain

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most…

Machine Learning · Statistics 2017-06-16 Szu-Wei Fu , Yu Tsao , Xugang Lu , Hisashi Kawai

We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band…

This paper proposes a deep neural network (DNN)-based multi-channel speech enhancement system in which a DNN is trained to maximize the quality of the enhanced time-domain signal. DNN-based multi-channel speech enhancement is often…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-17 Yoshiki Masuyama , Masahito Togami , Tatsuya Komatsu

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-28 Azam Rabiee , Geonmin Kim , Tae-Ho Kim , Soo-Young Lee

Current deep neural network (DNN) based speech separation faces a fundamental challenge -- while the models need to be trained on short segments due to computational constraints, real-world applications typically require processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-04 Yuzhu Wang , Archontis Politis , Konstantinos Drossos , Tuomas Virtanen

In this paper, we introduce a spectral-domain inverse filtering approach for single-channel speech de-reverberation using deep convolutional neural network (CNN). The main goal is to better handle realistic reverberant conditions where the…

Sound · Computer Science 2020-10-16 Hanwook Chung , Vikrant Singh Tomar , Benoit Champagne

In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-19 Kai Wang , Bengbeng He , Wei-Ping Zhu

Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity…

Sound · Computer Science 2023-08-02 Yangfu Li , Jiapan Gan , Xiaodan Lin
‹ Prev 1 2 3 10 Next ›