Related papers: General-Purpose Speech Representation Learning thr…

Multi-Granularity Framework for Unsupervised Representation Learning of Time Series

Representation learning plays a critical role in the analysis of time series data and has high practical value across a wide range of applications. including trend analysis, time series data retrieval and forecasting. In practice, data…

Machine Learning · Computer Science 2023-12-13 Chengyang Ye , Qiang Ma

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general…

Computation and Language · Computer Science 2018-04-03 Sandeep Subramanian , Adam Trischler , Yoshua Bengio , Christopher J Pal

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation

We propose a self-supervised learning method using multiple sampling strategies to obtain general-purpose audio representation. Multiple sampling strategies are used in the proposed method to construct contrastive losses from different…

Sound · Computer Science 2025-05-27 Ibuki Kuroyanagi , Tatsuya Komatsu

Generic Speech Enhancement with Self-Supervised Representation Space Loss

Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-11 Hiroshi Sato , Tsubasa Ochiai , Marc Delcroix , Takafumi Moriya , Takanori Ashihara , Ryo Masumura

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

Speech foundation models trained with self-supervised learning produce generic speech representations that support a wide range of speech processing tasks. When further adapted with supervised learning, these models can achieve strong…

Computation and Language · Computer Science 2026-03-10 Maryem Bouziane , Salima Mdhaffar , Yannick Estève

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in…

Computation and Language · Computer Science 2024-04-30 Hongfei Xue , Qijie Shao , Kaixun Huang , Peikun Chen , Jie Liu , Lei Xie

On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement

For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of…

Sound · Computer Science 2018-01-17 Robert Rehr , Timo Gerkmann

Efficiency-oriented approaches for self-supervised speech representation learning

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and…

Computation and Language · Computer Science 2023-12-19 Luis Lugo , Valentin Vielzeuf

Learning Multiscale Transformer Models for Sequence Generation

Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism.…

Computation and Language · Computer Science 2022-06-22 Bei Li , Tong Zheng , Yi Jing , Chengbo Jiao , Tong Xiao , Jingbo Zhu

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

A Hybrid Discriminative and Generative System for Universal Speech Enhancement

Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the…

Sound · Computer Science 2026-01-28 Yinghao Liu , Chengwei Liu , Xiaotao Liang , Haoyin Yan , Shaofei Xue , Zheng Xue

Learning Representation for Multitask learning through Self Supervised Auxiliary learning

Multi-task learning is a popular machine learning approach that enables simultaneous learning of multiple related tasks, improving algorithmic efficiency and effectiveness. In the hard parameter sharing approach, an encoder shared through…

Machine Learning · Statistics 2024-09-26 Seokwon Shin , Hyungrok Do , Youngdoo Son

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

Existing studies on self-supervised speech representation learning have focused on developing new training methods and applying pre-trained models for different applications. However, the quality of these models is often measured by the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-18 Alexander H. Liu , Sung-Lin Yeh , James Glass

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Recovering the masked speech frames is widely applied in speech representation learning. However, most of these models use random masking in the pre-training. In this work, we proposed two kinds of masking approaches: (1) speech-level…

Sound · Computer Science 2022-10-26 Xulong Zhang , Jianzong Wang , Ning Cheng , Kexin Zhu , Jing Xiao

Sustainable self-supervised learning for speech representations

Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive,…

Computation and Language · Computer Science 2024-06-13 Luis Lugo , Valentin Vielzeuf

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The…

Computation and Language · Computer Science 2015-03-31 Matthew Ager , Zoran Cvetkovic , Peter Sollich

Generative Modeling for Multi-task Visual Learning

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Zhipeng Bao , Martial Hebert , Yu-Xiong Wang

A Generative Self-Supervised Framework using Functional Connectivity in fMRI Data

Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity due to the increasing availability of data and advances in model architectures,…

Machine Learning · Computer Science 2023-12-05 Jungwon Choi , Seongho Keum , EungGu Yun , Byung-Hoon Kim , Juho Lee