Related papers: Do self-supervised speech models develop human-lik…

Speech Representation Analysis based on Inter- and Intra-Model Similarities

Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to…

Sound · Computer Science 2024-06-25 Yassine El Kheir , Ahmed Ali , Shammur Absar Chowdhury

Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio

Self-supervised learning has been used to leverage unlabelled data, improving accuracy and generalisation of speech systems through the training of representation models. While many recent works have sought to produce effective…

Computation and Language · Computer Science 2023-10-18 Antoni Dimitriadis , Siqi Pan , Vidhyasaharan Sethu , Beena Ahmed

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

Child speech recognition is still an underdeveloped area of research due to the lack of data (especially on non-English languages) and the specific difficulties of this task. Having explored various architectures for child speech…

Sound · Computer Science 2025-03-07 Lucas Block Medin , Thomas Pellegrini , Lucile Gelin

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Textless self-supervised speech models have grown in capabilities in recent years, but the nature of the linguistic information they encode has not yet been thoroughly examined. We evaluate the extent to which these models' learned…

Computation and Language · Computer Science 2023-06-13 Kinan Martin , Jon Gauthier , Canaan Breiss , Roger Levy

Self-supervised models of audio effectively explain human cortical responses to speech

Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either…

Computation and Language · Computer Science 2022-05-31 Aditya R. Vaidya , Shailee Jain , Alexander G. Huth

Iterative refinement, not training objective, makes HuBERT behave differently from wav2vec 2.0

Self-supervised models for speech representation learning now see widespread use for their versatility and performance on downstream tasks, but the effect of model architecture on the linguistic information learned in their representations…

Computation and Language · Computer Science 2025-08-12 Robin Huo , Ewan Dunbar

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well…

Computation and Language · Computer Science 2020-10-14 Juliette Millet , Ewan Dunbar

Evaluating Speaker Identity Coding in Self-supervised Models and Humans

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-18 Gasser Elbanna

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation…

Computation and Language · Computer Science 2022-06-01 Juliette Millet , Ioana Chitoran , Ewan Dunbar

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Probing phoneme, language and speaker information in unsupervised speech representations

Unsupervised models of representations based on Contrastive Predictive Coding (CPC)[1] are primarily used in spoken language modelling in that they encode phonetic information. In this study, we ask what other types of information are…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-02 Maureen de Seyssel , Marvin Lavechin , Yossi Adi , Emmanuel Dupoux , Guillaume Wisniewski

Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models

Analyses of self-supervised speech models have begun to reveal where and how they represent different types of information. However, almost all analyses have focused on English. Here, we examine how wav2vec2 models trained on four different…

Computation and Language · Computer Science 2025-06-13 Michele Gubian , Ioana Krehan , Oli Liu , James Kirby , Sharon Goldwater

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-09 Jingru Lin , Meng Ge , Wupeng Wang , Haizhou Li , Mengling Feng

Efficiency-oriented approaches for self-supervised speech representation learning

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and…

Computation and Language · Computer Science 2023-12-19 Luis Lugo , Valentin Vielzeuf

Tracking the emergence of linguistic structure in self-supervised models learning from speech

Self-supervised speech models learn effective representations of spoken language, which have been shown to reflect various aspects of linguistic structure. But when does such structure emerge in model training? We study the encoding of a…

Computation and Language · Computer Science 2026-04-03 Marianne de Heer Kloots , Martijn Bentum , Hosein Mohebbi , Charlotte Pouw , Gaofei Shen , Willem Zuidema

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-06 Duo Ma , Xianghu Yue , Junyi Ao , Xiaoxue Gao , Haizhou Li

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase,…

Computation and Language · Computer Science 2021-06-15 Wei-Ning Hsu , Benjamin Bolte , Yao-Hung Hubert Tsai , Kushal Lakhotia , Ruslan Salakhutdinov , Abdelrahman Mohamed

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical…

Sound · Computer Science 2022-12-06 Santiago Cuervo , Adrian Łańcucki , Ricard Marxer , Paweł Rychlikowski , Jan Chorowski

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

Self-supervised speech models have grown fast during the past few years and have proven feasible for use in various downstream tasks. Some recent work has started to look at the characteristics of these models, yet many concerns have not…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-13 Yuanchao Li , Yumnah Mohamied , Peter Bell , Catherine Lai

Do self-supervised speech and language models extract similar representations as human brain?

Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether…

Neurons and Cognition · Quantitative Biology 2024-02-01 Peili Chen , Linyang He , Li Fu , Lu Fan , Edward F. Chang , Yuanning Li