English
Related papers

Related papers: Interface Design for Self-Supervised Speech Models

200 papers

Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these…

Computation and Language · Computer Science 2023-02-21 A Arunkumar , Vrunda N Sukhadia , S. Umesh

Self-supervised learning (SSL) has achieved great success in various areas including speech processing. Recently, it is proven that speech based SSL models are able to extract superior universal representations on a range of downstream…

Sound · Computer Science 2022-12-21 Changli Tang , Yujin Wang , Xie Chen , Wei-Qiang Zhang

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due…

Sound · Computer Science 2025-10-07 Takashi Maekaku , Keita Goto , Jinchuan Tian , Yusuke Shinohara , Shinji Watanabe

Speech Foundation Models have gained significant attention recently. Prior works have shown that the fusion of representations from multiple layers of the same model or the fusion of multiple models can improve performance on downstream…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-12 Yi-Jen Shih , David Harwath

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-31 Zih-Ching Chen , Chin-Lun Fu , Chih-Ying Liu , Shang-Wen Li , Hung-yi Lee

Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the…

Computation and Language · Computer Science 2022-04-20 Dan Berrebbi , Jiatong Shi , Brian Yan , Osbel Lopez-Francisco , Jonathan D. Amith , Shinji Watanabe

Enhancing explainability in speech self-supervised learning (SSL) is important for developing reliable SSL-based speech processing systems. This study probes how speech SSL models encode speaker-specific information via a large-scale…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-06 Aemon Yat Fei Chiu , Kei Ching Fung , Roger Tsz Yeung Li , Jingyu Li , Tan Lee

Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations. Speech SSL models, such as WavLM, employ masked prediction training to encode general-purpose representations. In contrast,…

Computation and Language · Computer Science 2024-02-01 Takanori Ashihara , Marc Delcroix , Takafumi Moriya , Kohei Matsuura , Taichi Asami , Yusuke Ijima

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity,…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-26 Xin Guo , Chunrui Zhao , Hong Jia , Ting Dang , Gongping Huang , Xianrui Zheng , Yan Gao

Recent speech enhancement (SE) models increasingly leverage self-supervised learning (SSL) representations for their rich semantic information. Typically, intermediate features are aggregated into a single representation via a lightweight…

Sound · Computer Science 2026-02-02 Seungu Han , Sungho Lee , Kyogu Lee

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-02 Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-22 Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on…

Sound · Computer Science 2023-02-01 Tung-Yu Wu , Chen-An Li , Tzu-Han Lin , Tsu-Yuan Hsu , Hung-Yi Lee

Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation…

Computation and Language · Computer Science 2023-03-01 Yifan Peng , Kwangyoun Kim , Felix Wu , Prashant Sridhar , Shinji Watanabe

Although large-scale self-supervised learning (SSL) models like WavLM have achieved state-of-the-art performance in speech processing, their significant size impedes deployment on resource-constrained devices. While structured pruning is a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Junyi Peng , Lin Zhang , Jiangyu Han , Oldřich Plchot , Johan Rohdin , Themos Stafylakis , Shuai Wang , Jan Černocký

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-23 Zih-Ching Chen , Yu-Shun Sung , Hung-yi Lee

Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable…

Machine Learning · Computer Science 2022-06-30 Adam Dziedzic , Nikita Dhawan , Muhammad Ahmad Kaleem , Jonas Guan , Nicolas Papernot

Voice conversion (VC) systems are widely used for several applications, from speaker anonymisation to personalised speech synthesis. Supervised approaches learn a mapping between different speakers using parallel data, which is expensive to…

‹ Prev 1 2 3 10 Next ›