Related papers: Reconfigurable Multitask Audio Dynamics Processing…

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

With the surge of online meetings, it has become more critical than ever to provide high-quality speech audio and live captioning under various noise conditions. However, most monaural speech enhancement (SE) models introduce processing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-08 Sefik Emre Eskimez , Xiaofei Wang , Min Tang , Hemin Yang , Zirun Zhu , Zhuo Chen , Huaming Wang , Takuya Yoshioka

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other…

Computation and Language · Computer Science 2020-04-10 Junwei Liao , Sefik Emre Eskimez , Liyang Lu , Yu Shi , Ming Gong , Linjun Shou , Hong Qu , Michael Zeng

A Novel Reconfigurable Hardware Design for Speech Enhancement Based on Multi-Band Spectral Subtraction Involving Magnitude and Phase Components

This paper proposes an efficient reconfigurable hardware design for speech enhancement based on multi band spectral subtraction algorithm and involving both magnitude and phase components. Our proposed design is novel as it estimates…

Sound · Computer Science 2015-08-26 Tanmay Biswas , Sudhindu Bikash Mandal , Debasree Saha , Amlan Chakrabarti

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events

With the advances in deep learning, the performance of end-to-end (E2E) single-task models for speech and audio processing has been constantly improving. However, it is still challenging to build a general-purpose model with high…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-21 Xiaoyu Yang , Qiujia Li , Chao Zhang , Phil Woodland

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-10 Ante Jukić , Jagadeesh Balam , Boris Ginsburg

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Li Li , Ming Cheng , Weixin Zhu , Yannan Wang , Juan Liu , Ming Li

Dynamic Data Pruning for Automatic Speech Recognition

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data…

Computation and Language · Computer Science 2024-06-27 Qiao Xiao , Pingchuan Ma , Adriana Fernandez-Lopez , Boqian Wu , Lu Yin , Stavros Petridis , Mykola Pechenizkiy , Maja Pantic , Decebal Constantin Mocanu , Shiwei Liu

Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR

Automatic speech recognition (ASR) in multichannel, multi-speaker scenarios remains challenging due to ambient noise, reverberation and overlapping speakers. In this paper, we propose a beamforming approach that processes specific angular…

Sound · Computer Science 2025-09-15 Can Cui , Paul Magron , Mostafa Sadeghi , Emmanuel Vincent

Speaker conditioned acoustic modeling for multi-speaker conversational ASR

In this paper, we propose a novel approach for the transcription of speech conversations with natural speaker overlap, from single channel speech recordings. The proposed model is a combination of a speaker diarization system and a hybrid…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-30 Srikanth Raj Chetupalli , Sriram Ganapathy

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Karl El Hajal , Enno Hermann , Sevada Hovsepyan , Mathew Magimai. -Doss

Adverse Conditions and ASR Techniques for Robust Speech User Interface

The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to…

Computation and Language · Computer Science 2013-03-25 Urmila Shrawankar , VM Thakare

Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams

An increasingly common training paradigm for multi-talker automatic speech recognition (ASR) is to use speaker activity signals to adapt single-speaker ASR models for overlapping speech. Although effective, these systems require running the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-07 Xiluo He , Alexander Polok , Jesús Villalba , Thomas Thebaud , Matthew Maciejewski

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition

Meetings are a valuable yet challenging scenario for speech applications due to complex acoustic conditions. This paper summarizes the outcomes of the MISP 2025 Challenge, hosted at Interspeech 2025, which focuses on multi-modal,…

Sound · Computer Science 2025-05-28 Ming Gao , Shilong Wu , Hang Chen , Jun Du , Chin-Hui Lee , Shinji Watanabe , Jingdong Chen , Siniscalchi Sabato Marco , Odette Scharenborg

Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation

Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models. It is high time that we enhance the robustness of speech processing models to obtain good performance when…

Sound · Computer Science 2022-07-26 Kuan Po Huang , Yu-Kuan Fu , Yu Zhang , Hung-yi Lee

The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition,…

Multimedia · Computer Science 2023-03-14 Zhe Wang , Shilong Wu , Hang Chen , Mao-Kui He , Jun Du , Chin-Hui Lee , Jingdong Chen , Shinji Watanabe , Sabato Siniscalchi , Odette Scharenborg , Diyuan Liu , Baocai Yin , Jia Pan , Jianqing Gao , Cong Liu

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Otavio Braga , Olivier Siohan

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility

This paper enhances dysarthric and dysphonic speech recognition by fine-tuning pretrained automatic speech recognition (ASR) models on the 2023-10-05 data package of the Speech Accessibility Project (SAP), which contains the speech of 253…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-01 Xiuwen Zheng , Bornali Phukon , Mark Hasegawa-Johnson

Technical Report: A Practical Guide to Kaldi ASR Optimization

This technical report introduces innovative optimizations for Kaldi-based Automatic Speech Recognition (ASR) systems, focusing on acoustic model enhancement, hyperparameter tuning, and language model efficiency. We developed a custom…

Sound · Computer Science 2025-06-10 Mengze Hong , Di Jiang

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Desh Raj , Pavel Denisov , Zhuo Chen , Hakan Erdogan , Zili Huang , Maokui He , Shinji Watanabe , Jun Du , Takuya Yoshioka , Yi Luo , Naoyuki Kanda , Jinyu Li , Scott Wisdom , John R. Hershey

An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks

Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple…

Computation and Language · Computer Science 2024-06-24 Varsha Suresh , Salah Aït-Mokhtar , Caroline Brun , Ioan Calapodescu