English
Related papers

Related papers: Comparing supervised and self-supervised embedding…

200 papers

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a…

Sound · Computer Science 2022-06-28 Roshan Sharma , Tyler Vuong , Mark Lindsey , Hira Dhamyal , Rita Singh , Bhiksha Raj

We present an emotion recognition system for nonverbal vocalizations (NVs) submitted to the ExVo Few-Shot track of the ICML Expressive Vocalizations Competition 2022. The proposed method uses self-supervised learning (SSL) models to extract…

Sound · Computer Science 2022-06-23 Detai Xin , Shinnosuke Takamichi , Hiroshi Saruwatari

Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies…

This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop & Competition multitask track (ExVo-MultiTask). We first applied image classification models of various sizes on…

Sound · Computer Science 2022-06-28 Josh Belanich , Krishna Somandepalli , Brian Eoff , Brendan Jou

The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-13 Alice Baird , Panagiotis Tzirakis , Gauthier Gidel , Marco Jiralerspong , Eilif B. Muller , Kory Mathewson , Björn Schuller , Erik Cambria , Dacher Keltner , Alan Cowen

In recent years, self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) in order to learn deep representations without data annotations. While SSL frameworks reach…

Machine Learning · Computer Science 2023-08-01 Bulat Khaertdinov , Stylianos Asteriadis

We present Burst2Vec, our multi-task learning approach to predict emotion, age, and origin (i.e., native country/language) from vocal bursts. Burst2Vec utilises pre-trained speech representations to capture acoustic information from raw…

Sound · Computer Science 2022-10-19 Atijit Anuchitanukul , Lucia Specia

Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks…

Computer Vision and Pattern Recognition · Computer Science 2022-01-05 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

In this study, we investigate self-supervised representation learning for speaker verification (SV). First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Wei Xia , Chunlei Zhang , Chao Weng , Meng Yu , Dong Yu

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due…

Sound · Computer Science 2025-10-07 Takashi Maekaku , Keita Goto , Jinchuan Tian , Yusuke Shinohara , Shinji Watanabe

Self-supervised learning methods such as wav2vec 2.0 have shown promising results in learning speech representations from unlabelled and untranscribed speech data that are useful for speech recognition. Since these representations are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-22 Shehzeen Hussain , Van Nguyen , Shuhua Zhang , Erik Visser

Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots…

Machine Learning · Computer Science 2022-06-13 Randall Balestriero , Yann LeCun

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit…

Sound · Computer Science 2022-07-25 Shakeel Ahmad Sheikh , Md Sahidullah , Fabrice Hirsch , Slim Ouni

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-06 Kuo-Hsuan Hung , Szu-wei Fu , Huan-Hsin Tseng , Hsin-Tien Chiang , Yu Tsao , Chii-Wann Lin

Recent advancements in Deep and Self-Supervised Learning (SSL) have led to substantial improvements in Speech Emotion Recognition (SER) performance, reaching unprecedented levels. However, obtaining sufficient amounts of accurately labeled…

Computation and Language · Computer Science 2025-02-25 Bulat Khaertdinov , Pedro Jeuris , Annanda Sousa , Enrique Hortal

Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-08 Khazar Khorrami , María Andrea Cruz Blandón , Tuomas Virtanen , Okko Räsänen

We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning.…

Computer Vision and Pattern Recognition · Computer Science 2023-06-19 Giorgos Kordopatis-Zilos , Giorgos Tolias , Christos Tzelepis , Ioannis Kompatsiaris , Ioannis Patras , Symeon Papadopoulos

This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to…

Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-05 Ravi Shankar , Ke Tan , Buye Xu , Anurag Kumar
‹ Prev 1 2 3 10 Next ›