Related papers: Interface Design for Self-Supervised Speech Models

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these…

Computation and Language · Computer Science 2023-02-21 A Arunkumar , Vrunda N Sukhadia , S. Umesh

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Self-supervised learning (SSL) has achieved great success in various areas including speech processing. Recently, it is proven that speech based SSL models are able to extract superior universal representations on a range of downstream…

Sound · Computer Science 2022-12-21 Changli Tang , Yujin Wang , Xie Chen , Wei-Qiang Zhang

Evaluating Self-Supervised Speech Models via Text-Based LLMS

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due…

Sound · Computer Science 2025-10-07 Takashi Maekaku , Keita Goto , Jinchuan Tian , Yusuke Shinohara , Shinji Watanabe

Unifying Model and Layer Fusion for Speech Foundation Models

Speech Foundation Models have gained significant attention recently. Prior works have shown that the fusion of representations from multiple layers of the same model or the fusion of multiple models can improve performance on downstream…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-12 Yi-Jen Shih , David Harwath

Exploring Efficient-tuning Methods in Self-supervised Speech Models

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-31 Zih-Ching Chen , Chin-Lun Fu , Chih-Ying Liu , Shang-Wen Li , Hung-yi Lee

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the…

Computation and Language · Computer Science 2022-04-20 Dan Berrebbi , Jiatong Shi , Brian Yan , Osbel Lopez-Francisco , Jonathan D. Amith , Shinji Watanabe

A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations

Enhancing explainability in speech self-supervised learning (SSL) is important for developing reliable SSL-based speech processing systems. This study probes how speech SSL models encode speaker-specific information via a large-scale…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-06 Aemon Yat Fei Chiu , Kei Ching Fung , Roger Tsz Yeung Li , Jingyu Li , Tan Lee

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations. Speech SSL models, such as WavLM, employ masked prediction training to encode general-purpose representations. In contrast,…

Computation and Language · Computer Science 2024-02-01 Takanori Ashihara , Marc Delcroix , Takafumi Moriya , Kohei Matsuura , Taichi Asami , Yusuke Ijima

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity,…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-26 Xin Guo , Chunrui Zhao , Hong Jia , Ting Dang , Gongping Huang , Xianrui Zheng , Yan Gao

Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective

Recent speech enhancement (SE) models increasingly leverage self-supervised learning (SSL) representations for their rich semantic information. Typically, intermediate features are aggregated into a single representation via a lightweight…

Sound · Computer Science 2026-02-02 Seungu Han , Sungho Lee , Kyogu Lee

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-02 Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-22 Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli

The Efficacy of Self-Supervised Speech Models for Audio Representations

Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on…

Sound · Computer Science 2023-02-01 Tung-Yu Wu , Chen-An Li , Tzu-Han Lin , Tsu-Yuan Hsu , Hung-Yi Lee

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation…

Computation and Language · Computer Science 2023-03-01 Yifan Peng , Kwangyoun Kim , Felix Wu , Prashant Sridhar , Shinji Watanabe

Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing

Although large-scale self-supervised learning (SSL) models like WavLM have achieved state-of-the-art performance in speech processing, their significant size impedes deployment on resource-constrained devices. While structured pruning is a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Junyi Peng , Lin Zhang , Jiangyu Han , Oldřich Plchot , Johan Rohdin , Themos Stafylakis , Shuai Wang , Jan Černocký

CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-23 Zih-Ching Chen , Yu-Shun Sung , Hung-yi Lee

On the Difficulty of Defending Self-Supervised Learning against Model Extraction

Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable…

Machine Learning · Computer Science 2022-06-30 Adam Dziedzic , Nikita Dhawan , Muhammad Ahmad Kaleem , Jonas Guan , Nicolas Papernot

Investigating self-supervised features for expressive, multilingual voice conversion

Voice conversion (VC) systems are widely used for several applications, from speaker anonymisation to personalised speech synthesis. Supervised approaches learn a mapping between different speakers using parallel data, which is expensive to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-14 Álvaro Martín-Cortinas , Daniel Sáez-Trigueros , Grzegorz Beringer , Iván Vallés-Pérez , Roberto Barra-Chicote , Biel Tura-Vecino , Adam Gabryś , Piotr Bilinski , Thomas Merritt , Jaime Lorenzo-Trueba