Related papers: Speech-Declipping Transformer with Complex Spectro…

Deformable TDNN with adaptive receptive fields for speech recognition

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs are limited and fixed, which is not desirable for tasks…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-03 Keyu An , Yi Zhang , Zhijian Ou

DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

Clipping is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-09 Jayeon Yi , Junghyun Koo , Kyogu Lee

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One…

Sound · Computer Science 2023-03-13 William Ravenscroft , Stefan Goetze , Thomas Hain

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-17 Shubo Lv , Yihui Fu , Mengtao Xing , Jiayao Sun , Lei Xie , Jun Huang , Yannan Wang , Tao Yu

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identity of a test utterance is the same as the reference speech. In the past few years, extracting speaker embeddings using deep neural networks for SV systems has gone…

Sound · Computer Science 2022-05-27 Nan Zhang , Jianzong Wang , Zhenhou Hong , Chendong Zhao , Xiaoyang Qu , Jing Xiao

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-17 Daiki Takeuchi , Kohei Yatabe , Yuma Koizumi , Yasuhiro Oikawa , Noboru Harada

Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns

Speech denoising (SD) is an important task of many, if not all, modern signal processing chains used in devices and for everyday-life applications. While there are many published and powerful deep neural network (DNN)-based methods for SD,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-08 Konstantinos Drossos , Mikko Heikkinen , Paschalis Tsiaflakis

Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement

We present a novel model designed for resource-efficient multichannel speech enhancement in the time domain, with a focus on low latency, lightweight, and low computational requirements. The proposed model incorporates explicit spatial and…

Sound · Computer Science 2024-01-17 Ashutosh Pandey , Buye Xu

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs).…

Sound · Computer Science 2020-11-12 Cunhang Fan , Bin Liu , Jianhua Tao , Jiangyan Yi , Zhengqi Wen , Leichao Song

Deep Transform: Time-Domain Audio Error Correction via Probabilistic Re-Synthesis

In the process of recording, storage and transmission of time-domain audio signals, errors may be introduced that are difficult to correct in an unsupervised way. Here, we train a convolutional deep neural network to re-synthesize input…

Sound · Computer Science 2015-03-20 Andrew J. R. Simpson

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-16 Jin Sob Kim , Hyun Joon Park , Wooseok Shin , Juan Yun , Sung Won Han

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech…

Sound · Computer Science 2023-10-11 William Ravenscroft , Stefan Goetze , Thomas Hain

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most…

Machine Learning · Statistics 2017-06-16 Szu-Wei Fu , Yu Tsao , Xugang Lu , Hisashi Kawai

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band…

Sound · Computer Science 2023-08-07 Zhong-Qiu Wang , Samuele Cornell , Shukjae Choi , Younglo Lee , Byeong-Yeol Kim , Shinji Watanabe

Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel speech enhancement system in which a DNN is trained to maximize the quality of the enhanced time-domain signal. DNN-based multi-channel speech enhancement is often…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-17 Yoshiki Masuyama , Masahito Togami , Tatsuya Komatsu

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-28 Azam Rabiee , Geonmin Kim , Tae-Ho Kim , Soo-Young Lee

Multi-Utterance Speech Separation and Association Trained on Short Segments

Current deep neural network (DNN) based speech separation faces a fundamental challenge -- while the models need to be trained on short segments due to computational constraints, real-world applications typically require processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-04 Yuzhu Wang , Archontis Politis , Konstantinos Drossos , Tuomas Virtanen

Deep Convolutional Neural Network-based Inverse Filtering Approach for Speech De-reverberation

In this paper, we introduce a spectral-domain inverse filtering approach for single-channel speech de-reverberation using deep convolutional neural network (CNN). The main goal is to better handle realistic reverberant conditions where the…

Sound · Computer Science 2020-10-16 Hanwook Chung , Vikrant Singh Tomar , Benoit Champagne

TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain

In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-19 Kai Wang , Bengbeng He , Wei-Ping Zhu

DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification

Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity…

Sound · Computer Science 2023-08-02 Yangfu Li , Jiapan Gan , Xiaodan Lin