Related papers: An efficient supervised dictionary learning method…

Data-driven audio recognition: a supervised dictionary approach

Machine hearing is an emerging area. Motivated by the need of a principled framework across domain applications for machine listening, we propose a generic and data-driven representation learning approach. For this sake, a novel and…

Sound · Computer Science 2021-01-01 Imad Rida

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on…

Sound · Computer Science 2019-11-15 Aren Jansen , Daniel P. W. Ellis , Shawn Hershey , R. Channing Moore , Manoj Plakal , Ashok C. Popat , Rif A. Saurous

Compositional Audio Representation Learning

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

Learning audio sequence representations for acoustic event classification

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is…

Sound · Computer Science 2021-06-22 Zixing Zhang , Ding Liu , Jing Han , Kun Qian , Björn Schuller

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation

We propose a self-supervised learning method using multiple sampling strategies to obtain general-purpose audio representation. Multiple sampling strategies are used in the proposed method to construct contrastive losses from different…

Sound · Computer Science 2025-05-27 Ibuki Kuroyanagi , Tatsuya Komatsu

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal…

Sound · Computer Science 2020-11-05 Soo-Whan Chung , Hong Goo Kang , Joon Son Chung

A Music Classification Model based on Metric Learning and Feature Extraction from MP3 Audio Files

The development of models for learning music similarity and feature extraction from audio media files is an increasingly important task for the entertainment industry. This work proposes a novel music classification model based on metric…

Sound · Computer Science 2019-09-19 Angelo C. Mendes da Silva , Mauricio A. Nunes , Raul Fonseca Neto

Unsupervised Feature Learning for Audio Analysis

Identifying acoustic events from a continuously streaming audio source is of interest for many applications including environmental monitoring for basic research. In this scenario neither different event classes are known nor what…

Computer Vision and Pattern Recognition · Computer Science 2017-12-12 Matthias Meyer , Jan Beutel , Lothar Thiele

A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods…

Machine Learning · Computer Science 2016-12-16 Filip Korzeniowski , Gerhard Widmer

Towards Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

The deployment of machine listening algorithms in real-life applications is often impeded by a domain shift caused for instance by different microphone characteristics. In this paper, we propose a novel domain adaptation strategy based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-27 Jakob Abeßer , Meinard Müller

Contrastive Separative Coding for Self-supervised Representation Learning

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-02 Jun Wang , Max W. Y. Lam , Dan Su , Dong Yu

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Sound · Computer Science 2021-09-16 Eduardo Fonseca , Aren Jansen , Daniel P. W. Ellis , Scott Wisdom , Marco Tagliasacchi , John R. Hershey , Manoj Plakal , Shawn Hershey , R. Channing Moore , Xavier Serra

Concurrent Discrimination and Alignment for Self-Supervised Feature Learning

Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features…

Computer Vision and Pattern Recognition · Computer Science 2021-08-20 Anjan Dutta , Massimiliano Mancini , Zeynep Akata

Multi-Representation Knowledge Distillation For Audio Classification

As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can…

Multimedia · Computer Science 2020-02-25 Liang Gao , Kele Xu , Huaimin Wang , Yuxing Peng

Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

Existing audio analysis methods generally first transform the audio stream to spectrogram, and then feed it into CNN for further analysis. A standard CNN recognizes specific visual patterns over feature map, then pools for high-level…

Sound · Computer Science 2023-03-16 Yulin Pan , Xiangteng He , Biao Gong , Yuxin Peng , Yiliang Lv

Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer

Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…

Sound · Computer Science 2024-09-10 Bing Yang , Xiaofei Li

Low-rank Dictionary Learning for Unsupervised Feature Selection

There exist many high-dimensional data in real-world applications such as biology, computer vision, and social networks. Feature selection approaches are devised to confront with high-dimensional data challenges with the aim of efficient…

Machine Learning · Computer Science 2021-06-22 Mohsen Ghassemi Parsa , Hadi Zare , Mehdi Ghatee

Learning Normal Patterns in Musical Loops

This paper introduces an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques, addressing challenges in music information retrieval (MIR). Existing methods are often constrained…

Sound · Computer Science 2025-06-02 Shayan Dadman , Bernt Arild Bremdal , Børre Bang , Rune Dalmo

Supervised Dictionary Learning with Auxiliary Covariates

Supervised dictionary learning (SDL) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. The goal of SDL is to learn a…

Machine Learning · Statistics 2022-06-15 Joowon Lee , Hanbaek Lyu , Weixin Yao