English
Related papers

Related papers: Reconfigurable Multitask Audio Dynamics Processing…

200 papers

With the surge of online meetings, it has become more critical than ever to provide high-quality speech audio and live captioning under various noise conditions. However, most monaural speech enhancement (SE) models introduce processing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-08 Sefik Emre Eskimez , Xiaofei Wang , Min Tang , Hemin Yang , Zirun Zhu , Zhuo Chen , Huaming Wang , Takuya Yoshioka

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other…

Computation and Language · Computer Science 2020-04-10 Junwei Liao , Sefik Emre Eskimez , Liyang Lu , Yu Shi , Ming Gong , Linjun Shou , Hong Qu , Michael Zeng

This paper proposes an efficient reconfigurable hardware design for speech enhancement based on multi band spectral subtraction algorithm and involving both magnitude and phase components. Our proposed design is novel as it estimates…

Sound · Computer Science 2015-08-26 Tanmay Biswas , Sudhindu Bikash Mandal , Debasree Saha , Amlan Chakrabarti

With the advances in deep learning, the performance of end-to-end (E2E) single-task models for speech and audio processing has been constantly improving. However, it is still challenging to build a general-purpose model with high…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-21 Xiaoyu Yang , Qiujia Li , Chao Zhang , Phil Woodland

This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-10 Ante Jukić , Jagadeesh Balam , Boris Ginsburg

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Li Li , Ming Cheng , Weixin Zhu , Yannan Wang , Juan Liu , Ming Li

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data…

Automatic speech recognition (ASR) in multichannel, multi-speaker scenarios remains challenging due to ambient noise, reverberation and overlapping speakers. In this paper, we propose a beamforming approach that processes specific angular…

Sound · Computer Science 2025-09-15 Can Cui , Paul Magron , Mostafa Sadeghi , Emmanuel Vincent

In this paper, we propose a novel approach for the transcription of speech conversations with natural speaker overlap, from single channel speech recordings. The proposed model is a combination of a speaker diarization system and a hybrid…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-30 Srikanth Raj Chetupalli , Sriram Ganapathy

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Karl El Hajal , Enno Hermann , Sevada Hovsepyan , Mathew Magimai. -Doss

The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to…

Computation and Language · Computer Science 2013-03-25 Urmila Shrawankar , VM Thakare

An increasingly common training paradigm for multi-talker automatic speech recognition (ASR) is to use speaker activity signals to adapt single-speaker ASR models for overlapping speech. Although effective, these systems require running the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-07 Xiluo He , Alexander Polok , Jesús Villalba , Thomas Thebaud , Matthew Maciejewski

Meetings are a valuable yet challenging scenario for speech applications due to complex acoustic conditions. This paper summarizes the outcomes of the MISP 2025 Challenge, hosted at Interspeech 2025, which focuses on multi-modal,…

Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models. It is high time that we enhance the robustness of speech processing models to obtain good performance when…

Sound · Computer Science 2022-07-26 Kuan Po Huang , Yu-Kuan Fu , Yu Zhang , Hung-yi Lee

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition,…

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Otavio Braga , Olivier Siohan

This paper enhances dysarthric and dysphonic speech recognition by fine-tuning pretrained automatic speech recognition (ASR) models on the 2023-10-05 data package of the Speech Accessibility Project (SAP), which contains the speech of 253…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-01 Xiuwen Zheng , Bornali Phukon , Mark Hasegawa-Johnson

This technical report introduces innovative optimizations for Kaldi-based Automatic Speech Recognition (ASR) systems, focusing on acoustic model enhancement, hyperparameter tuning, and language model efficiency. We developed a custom…

Sound · Computer Science 2025-06-10 Mengze Hong , Di Jiang

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Desh Raj , Pavel Denisov , Zhuo Chen , Hakan Erdogan , Zili Huang , Maokui He , Shinji Watanabe , Jun Du , Takuya Yoshioka , Yi Luo , Naoyuki Kanda , Jinyu Li , Scott Wisdom , John R. Hershey

Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple…

Computation and Language · Computer Science 2024-06-24 Varsha Suresh , Salah Aït-Mokhtar , Caroline Brun , Ioan Calapodescu
‹ Prev 1 2 3 10 Next ›