English
Related papers

Related papers: Learning Problem-agnostic Speech Representations f…

200 papers

Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-21 Mirco Ravanelli , Jianyuan Zhong , Santiago Pascual , Pawel Swietojanski , Joao Monteiro , Jan Trmal , Yoshua Bengio

This thesis focuses on representation learning for sequence data over time or space, aiming to improve downstream sequence prediction tasks by using the learned representations. Supervised learning has been the most dominant approach for…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-02 Qingming Tang

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech…

Computation and Language · Computer Science 2021-12-15 Pierre Beckmann , Mikolaj Kegler , Milos Cernak

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and…

Computation and Language · Computer Science 2023-12-19 Luis Lugo , Valentin Vielzeuf

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-02 Jun Wang , Max W. Y. Lam , Dan Su , Dong Yu

Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Najim Dehak

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing…

Computation and Language · Computer Science 2019-01-25 Siddhartha Brahma

Self-supervised learning can significantly improve the performance of downstream tasks, however, the dimensions of learned representations normally lack explicit physical meanings. In this work, we propose a novel self-supervised approach…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-19 Yifan Sun , Xihong Wu

The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and…

Computation and Language · Computer Science 2025-07-22 Varun Krishna , Sriram Ganapathy

The encoder-decoder models for unsupervised sentence representation learning tend to discard the decoder after being trained on a large unlabelled corpus, since only the encoder is needed to map the input sentence into a vector…

Neural and Evolutionary Computing · Computer Science 2019-06-03 Shuai Tang , Virginia R. de Sa

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-04 Lasse Borgholt , Jakob Drachmann Havtorn , Joakim Edin , Lars Maaløe , Christian Igel

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding…

Computation and Language · Computer Science 2022-09-20 Gašper Beguš , Alan Zhou

Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-09 Mirco Ravanelli , Yoshua Bengio

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…

Sound · Computer Science 2024-09-10 Bing Yang , Xiaofei Li

Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust…

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang
‹ Prev 1 2 3 10 Next ›