Related papers: Comparing supervised and self-supervised embedding…

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a…

Sound · Computer Science 2022-06-28 Roshan Sharma , Tyler Vuong , Mark Lindsey , Hira Dhamyal , Rita Singh , Bhiksha Raj

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

We present an emotion recognition system for nonverbal vocalizations (NVs) submitted to the ExVo Few-Shot track of the ICML Expressive Vocalizations Competition 2022. The proposed method uses self-supervised learning (SSL) models to extract…

Sound · Computer Science 2022-06-23 Detai Xin , Shinnosuke Takamichi , Hiroshi Saruwatari

Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies…

Sound · Computer Science 2024-06-26 Sri Harsha Dumpala , Katerina Dikaios , Abraham Nunes , Frank Rudzicz , Rudolf Uher , Sageev Oore

Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop & Competition multitask track (ExVo-MultiTask). We first applied image classification models of various sizes on…

Sound · Computer Science 2022-06-28 Josh Belanich , Krishna Somandepalli , Brian Eoff , Brendan Jou

The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts

The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-13 Alice Baird , Panagiotis Tzirakis , Gauthier Gidel , Marco Jiralerspong , Eilif B. Muller , Kory Mathewson , Björn Schuller , Erik Cambria , Dacher Keltner , Alan Cowen

Explaining, Analyzing, and Probing Representations of Self-Supervised Learning Models for Sensor-based Human Activity Recognition

In recent years, self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) in order to learn deep representations without data annotations. While SSL frameworks reach…

Machine Learning · Computer Science 2023-08-01 Bulat Khaertdinov , Stylianos Asteriadis

Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal Bursts

We present Burst2Vec, our multi-task learning approach to predict emotion, age, and origin (i.e., native country/language) from vocal bursts. Burst2Vec utilises pre-trained speech representations to capture acoustic information from raw…

Sound · Computer Science 2022-10-19 Atijit Anuchitanukul , Lucia Specia

Sound and Visual Representation Learning with Multiple Pretraining Tasks

Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks…

Computer Vision and Pattern Recognition · Computer Science 2022-01-05 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning for speaker verification (SV). First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Wei Xia , Chunlei Zhang , Chao Weng , Meng Yu , Dong Yu

Evaluating Self-Supervised Speech Models via Text-Based LLMS

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due…

Sound · Computer Science 2025-10-07 Takashi Maekaku , Keita Goto , Jinchuan Tian , Yusuke Shinohara , Shinji Watanabe

Multi-task Voice Activated Framework using Self-supervised Learning

Self-supervised learning methods such as wav2vec 2.0 have shown promising results in learning speech representations from unlabelled and untranscribed speech data that are useful for speech recognition. Since these representations are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-22 Shehzeen Hussain , Van Nguyen , Shuhua Zhang , Erik Visser

Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods

Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots…

Machine Learning · Computer Science 2022-06-13 Randall Balestriero , Yann LeCun

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit…

Sound · Computer Science 2022-07-25 Shakeel Ahmad Sheikh , Md Sahidullah , Fabrice Hirsch , Slim Ouni

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

Boosting Self-Supervised Embeddings for Speech Enhancement

Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-06 Kuo-Hsuan Hung , Szu-wei Fu , Huan-Hsin Tseng , Hsin-Tien Chiang , Yu Tsao , Chii-Wann Lin

Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations

Recent advancements in Deep and Self-Supervised Learning (SSL) have led to substantial improvements in Speech Emotion Recognition (SER) performance, reaching unprecedented levels. However, obtaining sufficient amounts of accurately labeled…

Computation and Language · Computer Science 2025-02-25 Bulat Khaertdinov , Pedro Jeuris , Annanda Sousa , Enrique Hortal

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-08 Khazar Khorrami , María Andrea Cruz Blandón , Tuomas Virtanen , Okko Räsänen

Self-Supervised Video Similarity Learning

We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning.…

Computer Vision and Pattern Recognition · Computer Science 2023-06-19 Giorgos Kordopatis-Zilos , Giorgos Tolias , Christos Tzelepis , Ioannis Kompatsiaris , Ioannis Patras , Symeon Papadopoulos

Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts

This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to…

Sound · Computer Science 2022-08-17 Alice Baird , Panagiotis Tzirakis , Gauthier Gidel , Marco Jiralerspong , Eilif B. Muller , Kory Mathewson , Björn Schuller , Erik Cambria , Dacher Keltner , Alan Cowen

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-05 Ravi Shankar , Ke Tan , Buye Xu , Anurag Kumar