English
Related papers

Related papers: Linear-Complexity Self-Supervised Learning for Spe…

200 papers

Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention…

Machine Learning · Computer Science 2024-09-05 Ryan Whetten , Titouan Parcollet , Adel Moumen , Marco Dinarelli , Yannick Estève

Automatic speech recognition (ASR) with an encoder equipped with self-attention, whether streaming or non-streaming, takes quadratic time in the length of the speech utterance. This slows down training and decoding, increase their cost, and…

Sound · Computer Science 2024-09-12 Titouan Parcollet , Rogier van Dalen , Shucong Zhang , Sourav Batthacharya

Self-supervised learning (SSL) has advanced speech processing but suffers from quadratic complexity due to self-attention. To address this, SummaryMixing (SM) has been proposed as a linear-time alternative that summarizes entire utterances…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Aditya Srinivas Menon , Kumud Tripathi , Raj Gohil , Pankaj Wasnik

Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train…

Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference and training and increasing memory consumption.…

Computation and Language · Computer Science 2024-07-12 Titouan Parcollet , Rogier van Dalen , Shucong Zhang , Sourav Bhattacharya

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already…

Computation and Language · Computer Science 2023-09-29 William Chen , Jiatong Shi , Brian Yan , Dan Berrebbi , Wangyou Zhang , Yifan Peng , Xuankai Chang , Soumi Maiti , Shinji Watanabe

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Po-chun Hsu , Ali Elkahky , Wei-Ning Hsu , Yossi Adi , Tu Anh Nguyen , Jade Copet , Emmanuel Dupoux , Hung-yi Lee , Abdelrahman Mohamed

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical…

Large Language Models (LLMs) have achieved remarkable performance across a wide range of Natural Language Processing (NLP) tasks. However, in long-context scenarios, they face two challenges: high computational cost and information…

Computation and Language · Computer Science 2026-02-10 Jiwei Tang , Zhicheng Zhang , Shunlong Wu , Jingheng Ye , Lichen Bai , Zitai Wang , Tingwei Lu , Lin Hai , Yiming Zhao , Hai-Tao Zheng , Hong-Gee Kim

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due…

Sound · Computer Science 2025-10-07 Takashi Maekaku , Keita Goto , Jinchuan Tian , Yusuke Shinohara , Shinji Watanabe

In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream…

Sound · Computer Science 2025-06-03 Ryan Whetten , Lucas Maison , Titouan Parcollet , Marco Dinarelli , Yannick Estève

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity,…

Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-08 Khazar Khorrami , María Andrea Cruz Blandón , Tuomas Virtanen , Okko Räsänen

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks…

Computation and Language · Computer Science 2023-10-02 Guanrou Yang , Ziyang Ma , Zhisheng Zheng , Yakun Song , Zhikang Niu , Xie Chen

The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention…

Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a…

Visual Self-Supervised Learning (SSL) currently underperforms Contrastive Language-Image Pretraining (CLIP) in multimodal settings such as Visual Question Answering (VQA). This multimodal gap is often attributed to the semantics introduced…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 David Fan , Shengbang Tong , Jiachen Zhu , Koustuv Sinha , Zhuang Liu , Xinlei Chen , Michael Rabbat , Nicolas Ballas , Yann LeCun , Amir Bar , Saining Xie
‹ Prev 1 2 3 10 Next ›