Related papers: Refining Self-Supervised Learnt Speech Representat…

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Hejung Yang , Hong-Goo Kang

Layer-wise Analysis of a Self-supervised Speech Representation Model

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain

Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in late layers. Recent work showed that…

Computation and Language · Computer Science 2025-06-05 Omer Moussa , Mariya Toneva

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Speech language models align with human brain responses to natural language to an impressive degree. However, current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics which limits their utility…

Computation and Language · Computer Science 2025-03-05 Omer Moussa , Dietrich Klakow , Mariya Toneva

Don't stop the training: continuously-updating self-supervised algorithms best account for auditory responses in the cortex

Over the last decade, numerous studies have shown that deep neural networks exhibit sensory representations similar to those of the mammalian brain, in that their activations linearly map onto cortical responses to the same sensory inputs.…

Neurons and Cognition · Quantitative Biology 2022-02-16 Pierre Orhan , Yves Boubenec , Jean-Rémi King

Toward a realistic model of speech processing in the brain with self-supervised learning

Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts…

Neurons and Cognition · Quantitative Biology 2023-03-21 Juliette Millet , Charlotte Caucheteux , Pierre Orhan , Yves Boubenec , Alexandre Gramfort , Ewan Dunbar , Christophe Pallier , Jean-Remi King

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Sound · Computer Science 2022-01-10 Sangeeta Srivastava , Yun Wang , Andros Tjandra , Anurag Kumar , Chunxi Liu , Kritika Singh , Yatharth Saraf

Inducing brain-relevant bias in natural language processing models

Progress in natural language processing (NLP) models that estimate representations of word sequences has recently been leveraged to improve the understanding of language processing in the brain. However, these models have not been…

Neurons and Cognition · Quantitative Biology 2019-11-11 Dan Schwartz , Mariya Toneva , Leila Wehbe

Task-Agnostic Structured Pruning of Speech Representation Models

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-08 Haoyu Wang , Siyuan Wang , Wei-Qiang Zhang , Hongbin Suo , Yulong Wan

Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-03 Yu-An Chung , Yonatan Belinkov , James Glass

Multi-task Voice Activated Framework using Self-supervised Learning

Self-supervised learning methods such as wav2vec 2.0 have shown promising results in learning speech representations from unlabelled and untranscribed speech data that are useful for speech recognition. Since these representations are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-22 Shehzeen Hussain , Van Nguyen , Shuhua Zhang , Erik Visser

Do self-supervised speech and language models extract similar representations as human brain?

Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether…

Neurons and Cognition · Quantitative Biology 2024-02-01 Peili Chen , Linyang He , Li Fu , Lu Fan , Edward F. Chang , Yuanning Li

An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks

Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple…

Computation and Language · Computer Science 2024-06-24 Varsha Suresh , Salah Aït-Mokhtar , Caroline Brun , Ioan Calapodescu

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve…

Computation and Language · Computer Science 2022-10-25 Hao Yang , Jinming Zhao , Gholamreza Haffari , Ehsan Shareghi

Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing

Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high…

Machine Learning · Computer Science 2025-01-22 Eklavya Sarkar , Mathew Magimai. -Doss

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-20 Han Zhu , Li Wang , Jindong Wang , Gaofeng Cheng , Pengyuan Zhang , Yonghong Yan

An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Samuel Kessler , Bethan Thomas , Salah Karout

Self-supervised models of audio effectively explain human cortical responses to speech

Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either…

Computation and Language · Computer Science 2022-05-31 Aditya R. Vaidya , Shailee Jain , Alexander G. Huth

On the Use of Self-Supervised Representation Learning for Speaker Diarization and Separation

Self-supervised speech models such as wav2vec2.0 and WavLM have been shown to significantly improve the performance of many downstream speech tasks, especially in low-resource settings, over the past few years. Despite this, evaluations on…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-18 Séverin Baroudi , Hervé Bredin , Joseph Razik , Ricard Marxer

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works…

Computation and Language · Computer Science 2021-10-12 Xuankai Chang , Takashi Maekaku , Pengcheng Guo , Jing Shi , Yen-Ju Lu , Aswin Shanmugam Subramanian , Tianzi Wang , Shu-wen Yang , Yu Tsao , Hung-yi Lee , Shinji Watanabe