Related papers: Learning Problem-agnostic Speech Representations f…

Multi-task self-supervised learning for Robust Speech Recognition

Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-21 Mirco Ravanelli , Jianyuan Zhong , Santiago Pascual , Pawel Swietojanski , Joao Monteiro , Jan Trmal , Yoshua Bengio

Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

This thesis focuses on representation learning for sequence data over time or space, aiming to improve downstream sequence prediction tasks by using the learned representations. Supervised learning has been the most dominant approach for…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-02 Qingming Tang

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech…

Computation and Language · Computer Science 2021-12-15 Pierre Beckmann , Mikolaj Kegler , Milos Cernak

Efficiency-oriented approaches for self-supervised speech representation learning

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and…

Computation and Language · Computer Science 2023-12-19 Luis Lugo , Valentin Vielzeuf

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Contrastive Separative Coding for Self-supervised Representation Learning

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-02 Jun Wang , Max W. Y. Lam , Dan Su , Dong Yu

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Najim Dehak

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Unsupervised Learning of Sentence Representations Using Sequence Consistency

Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing…

Computation and Language · Computer Science 2019-01-25 Siddhartha Brahma

Embodied Self-supervised Learning by Coordinated Sampling and Training

Self-supervised learning can significantly improve the performance of downstream tasks, however, the dimensions of learned representations normally lack explicit physical meanings. In this work, we propose a novel self-supervised approach…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-19 Yifan Sun , Xihong Wu

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and…

Computation and Language · Computer Science 2025-07-22 Varun Krishna , Sriram Ganapathy

Exploiting Invertible Decoders for Unsupervised Sentence Representation Learning

The encoder-decoder models for unsupervised sentence representation learning tend to discard the decoder after being trained on a large unlabelled corpus, since only the encoder is needed to map the input sentence into a vector…

Neural and Evolutionary Computing · Computer Science 2019-06-03 Shuai Tang , Virginia R. de Sa

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

A Brief Overview of Unsupervised Neural Speech Representation Learning

Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-04 Lasse Borgholt , Jakob Drachmann Havtorn , Joakim Edin , Lars Maaløe , Christian Igel

Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding…

Computation and Language · Computer Science 2022-09-20 Gašper Beguš , Alan Zhou

Learning Speaker Representations with Mutual Information

Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-09 Mirco Ravanelli , Yoshua Bengio

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer

Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…

Sound · Computer Science 2024-09-10 Bing Yang , Xiaofei Li

Learning An Invariant Speech Representation

Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust…

Sound · Computer Science 2014-06-17 Georgios Evangelopoulos , Stephen Voinea , Chiyuan Zhang , Lorenzo Rosasco , Tomaso Poggio

Multi-Task Self-Supervised Pre-Training for Music Classification

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang