Related papers: An efficient supervised dictionary learning method…
Machine hearing is an emerging area. Motivated by the need of a principled framework across domain applications for machine listening, we propose a generic and data-driven representation learning approach. For this sake, a novel and…
Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on…
Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…
Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is…
We propose a self-supervised learning method using multiple sampling strategies to obtain general-purpose audio representation. Multiple sampling strategies are used in the proposed method to construct contrastive losses from different…
Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…
The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal…
The development of models for learning music similarity and feature extraction from audio media files is an increasingly important task for the entertainment industry. This work proposes a novel music classification model based on metric…
Identifying acoustic events from a continuously streaming audio source is of interest for many applications including environmental monitoring for basic research. In this scenario neither different event classes are known nor what…
Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods…
The deployment of machine listening algorithms in real-life applications is often impeded by a domain shift caused for instance by different microphone characteristics. In this paper, we propose a novel domain adaptation strategy based on…
To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating…
Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…
Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features…
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can…
Existing audio analysis methods generally first transform the audio stream to spectrogram, and then feed it into CNN for further analysis. A standard CNN recognizes specific visual patterns over feature map, then pools for high-level…
Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…
There exist many high-dimensional data in real-world applications such as biology, computer vision, and social networks. Feature selection approaches are devised to confront with high-dimensional data challenges with the aim of efficient…
This paper introduces an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques, addressing challenges in music information retrieval (MIR). Existing methods are often constrained…
Supervised dictionary learning (SDL) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. The goal of SDL is to learn a…