Related papers: Self-Supervised Learning-Based Source Separation f…

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech…

Sound · Computer Science 2023-07-25 Yoshiki Masuyama , Xuankai Chang , Wangyou Zhang , Samuele Cornell , Zhong-Qiu Wang , Nobutaka Ono , Yanmin Qian , Shinji Watanabe

Speech separation with large-scale self-supervised learning

Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-29 Zhuo Chen , Naoyuki Kanda , Jian Wu , Yu Wu , Xiaofei Wang , Takuya Yoshioka , Jinyu Li , Sunit Sivasankaran , Sefik Emre Eskimez

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-06 Jian Wu , Zhuo Chen , Sanyuan Chen , Yu Wu , Takuya Yoshioka , Naoyuki Kanda , Shujie Liu , Jinyu Li

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR). An ASR model with decent performance can be realized by fine-tuning an SSL model with…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-30 Zhisheng Zheng , Ziyang Ma , Yu Wang , Xie Chen

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR…

Sound · Computer Science 2022-05-05 Changfeng Gao , Gaofeng Cheng , Pengyuan Zhang

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR). While various approaches have been proposed, all previous…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-14 Naoyuki Kanda , Guoli Ye , Yu Wu , Yashesh Gaur , Xiaofei Wang , Zhong Meng , Zhuo Chen , Takuya Yoshioka

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and…

Sound · Computer Science 2025-01-07 Shih-Heng Wang , Zih-Ching Chen , Jiatong Shi , Ming-To Chuang , Guan-Ting Lin , Kuan-Po Huang , David Harwath , Shang-Wen Li , Hung-yi Lee

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

Automatic speech recognition (ASR) models rely on high-quality transcribed data for effective training. Generating pseudo-labels for large unlabeled audio datasets often relies on complex pipelines that combine multiple ASR outputs through…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-06 Jeena Prakash , Blessingh Kumar , Kadri Hacioglu , Bidisha Sharma , Sindhuja Gopalan , Malolan Chetlur , Shankar Venkatesan , Andreas Stolcke

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models

In real-world applications, automatic speech recognition (ASR) systems must handle overlapping speech from multiple speakers and recognize rare words like technical terms. Traditional methods address multi-talker ASR and contextual biasing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-17 Jiajun He , Naoki Sawada , Koichi Miyazaki , Tomoki Toda

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Sound · Computer Science 2021-09-16 Eduardo Fonseca , Aren Jansen , Daniel P. W. Ellis , Scott Wisdom , Marco Tagliasacchi , John R. Hershey , Manoj Plakal , Shawn Hershey , R. Channing Moore , Xavier Serra

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming…

Sound · Computer Science 2022-05-19 Mostafa Karimi , Changliang Liu , Kenichi Kumatani , Yao Qian , Tianyu Wu , Jian Wu

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Victor Miara , Theo Lepage , Reda Dehak

Audio-visual Multi-channel Recognition of Overlapped Speech

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-19 Jianwei Yu , Bo Wu , Rongzhi Gu , Shi-Xiong Zhang , Lianwu Chen , Yong Xu. Meng Yu , Dan Su , Dong Yu , Xunying Liu , Helen Meng

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-22 Thilo von Neumann , Christoph Boeddeker , Lukas Drude , Keisuke Kinoshita , Marc Delcroix , Tomohiro Nakatani , Reinhold Haeb-Umbach