English
Related papers

Related papers: Optimizing Speech Multi-View Feature Fusion throug…

200 papers

Recent advances in synthetic speech have made audio deepfakes increasingly realistic, posing significant security risks. Existing detection methods that rely on a single modality, either raw waveform embeddings or spectral based features,…

Self-supervised learning (SSL) has achieved great success in various areas including speech processing. Recently, it is proven that speech based SSL models are able to extract superior universal representations on a range of downstream…

Sound · Computer Science 2022-12-21 Changli Tang , Yujin Wang , Xie Chen , Wei-Qiang Zhang

Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the…

Computation and Language · Computer Science 2022-04-20 Dan Berrebbi , Jiatong Shi , Brian Yan , Osbel Lopez-Francisco , Jonathan D. Amith , Shinji Watanabe

Fake speech detection systems have become a necessity to combat against speech deepfakes. Current systems exhibit poor generalizability on out-of-domain speech samples due to lack to diverse training data. In this paper, we attempt to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-27 Rishith Sadashiv T N , Abhishek Bedge , Saisha Suresh Bore , Jagabandhu Mishra , Mrinmoy Bhattacharjee , S R Mahadeva Prasanna

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity,…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-26 Xin Guo , Chunrui Zhao , Hong Jia , Ting Dang , Gongping Huang , Xianrui Zheng , Yan Gao

Self-Supervised Learning (SSL) models have demonstrated exceptional performance in various speech tasks, particularly in low-resource and multilingual domains. Recent works show that fusing diverse SSL models could achieve superior…

Sound · Computer Science 2024-06-07 Tejes Srivastava , Jiatong Shi , William Chen , Shinji Watanabe

This study investigates fine-tuning self-supervised learn ing (SSL) models using multi-task learning (MTL) to enhance speech emotion recognition (SER). The framework simultane ously handles four related tasks: emotion recognition, gender…

Sound · Computer Science 2025-08-26 Honghong Wang , Jing Deng , Fanqin Meng , Rong Zheng

Self-supervised learning (SSL) models have shown exceptional capabilities across various speech-processing tasks. Continuous SSL representations are effective but suffer from high computational and storage demands. On the other hand,…

Sound · Computer Science 2024-11-28 Shih-heng Wang , Jiatong Shi , Chien-yu Huang , Shinji Watanabe , Hung-yi Lee

Recent speech enhancement (SE) models increasingly leverage self-supervised learning (SSL) representations for their rich semantic information. Typically, intermediate features are aggregated into a single representation via a lightweight…

Sound · Computer Science 2026-02-02 Seungu Han , Sungho Lee , Kyogu Lee

The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However,…

Sound · Computer Science 2024-02-07 Yasar Abbas Ur Rehman , Kin Wai Lau , Yuyang Xie , Lan Ma , Jiajun Shen

Automatic methods to predict Mean Opinion Score (MOS) of listeners have been researched to assure the quality of Text-to-Speech systems. Many previous studies focus on architectural advances (e.g. MBNet, LDNet, etc.) to capture relations…

Sound · Computer Science 2022-06-29 Aki Kunikoshi , Jaebok Kim , Wonsuk Jun , Kåre Sjölander

Multimodal Large Language Models (MLLMs) have achieved notable success in enhancing translation performance by integrating multimodal information. However, existing research primarily focuses on image-guided methods, whose applicability is…

Computation and Language · Computer Science 2026-03-04 Yexing Du , Youcheng Pan , Zekun Wang , Zheng Chu , Yichong Huang , Kaiyuan Liu , Bo Yang , Yang Xiang , Ming Liu , Bing Qin

Multimodal emotion recognition from speech is an important area in affective computing. Fusing multiple data modalities and learning representations with limited amounts of labeled data is a challenging task. In this paper, we explore the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-08 Shamane Siriwardhana , Andrew Reis , Rivindu Weerasekera , Suranga Nanayakkara

Recommender systems are widely deployed in various web environments, and self-supervised learning (SSL) has recently attracted significant attention in this field. Contrastive learning (CL) stands out as a major SSL paradigm due to its…

Information Retrieval · Computer Science 2025-01-17 Yu Zhang , Lei Sang , Yi Zhang , Yiwen Zhang , Yun Yang

Using self-supervised learning (SSL) models has significantly improved performance for downstream speech tasks, surpassing the capabilities of traditional hand-crafted features. This study investigates the amalgamation of SSL models, with…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Szu-Jui Chen , John H. L. Hansen

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-31 Zih-Ching Chen , Chin-Lun Fu , Chih-Ying Liu , Shang-Wen Li , Hung-yi Lee

The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially…

Group conversations over videoconferencing are a complex social behavior. However, the subjective moments of negative experience, where the conversation loses fluidity or enjoyment remain understudied. These moments are infrequent in…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-20 Andrew Chang , Chenkai Hu , Ji Qi , Zhuojian Wei , Kexin Zhang , Viswadruth Akkaraju , David Poeppel , Dustin Freeman

Designing a speech quality assessment (SQA) system for estimating mean-opinion-score (MOS) of multi-rate speech with varying sampling frequency (16-48 kHz) is a challenging task. The challenge arises due to the limited availability of a…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-17 Fengyuan Cao , Xinyu Liang , Fredrik Cumlin , Victor Ungureanu , Chandan K. A. Reddy , Christian Schuldt , Saikat Chatterjee

Self-supervised speech (SSL) models have recently become widely adopted for many downstream speech processing tasks. The general usage pattern is to employ SSL models as feature extractors, and then train a downstream prediction head to…

Sound · Computer Science 2024-06-19 Yi-Jen Shih , David Harwath
‹ Prev 1 2 3 10 Next ›