Related papers: Data-driven audio recognition: a supervised dictio…

An efficient supervised dictionary learning method for audio signal recognition

Machine hearing or listening represents an emerging area. Conventional approaches rely on the design of handcrafted features specialized to a specific audio task and that can hardly generalized to other audio fields. For example,…

Computer Vision and Pattern Recognition · Computer Science 2018-12-13 Imad Rida , Romain Hérault , Gilles Gasso

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

A Model You Can Hear: Audio Identification with Playable Prototypes

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by…

Sound · Computer Science 2022-08-08 Romain Loiseau , Baptiste Bouvier , Yann Teytaut , Elliot Vincent , Mathieu Aubry , Loic Landrieu

Compositional Audio Representation Learning

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods…

Machine Learning · Computer Science 2016-12-16 Filip Korzeniowski , Gerhard Widmer

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans. Current automated models of Machine Listening vary from purely data-driven…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-27 Laurie M. Heller , Benjamin Elizalde , Bhiksha Raj , Soham Deshmukh

Audio Self-supervised Learning: A Survey

Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and…

Sound · Computer Science 2022-03-03 Shuo Liu , Adria Mallol-Ragolta , Emilia Parada-Cabeleiro , Kun Qian , Xin Jing , Alexander Kathan , Bin Hu , Bjoern W. Schuller

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

Sound · Computer Science 2022-10-11 Matthew C. McCallum , Filip Korzeniowski , Sergio Oramas , Fabien Gouyon , Andreas F. Ehmann

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on…

Sound · Computer Science 2019-11-15 Aren Jansen , Daniel P. W. Ellis , Shawn Hershey , R. Channing Moore , Manoj Plakal , Ashok C. Popat , Rif A. Saurous

Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space

Audio perception is a key to solving a variety of problems ranging from acoustic scene analysis, music meta-data extraction, recommendation, synthesis and analysis. It can potentially also augment computers in doing tasks that humans do…

Sound · Computer Science 2020-02-12 Prateek Verma , Kenneth Salisbury

Unsupervised Composable Representations for Audio

Current generative models are able to generate high-quality artefacts but have been shown to struggle with compositional reasoning, which can be defined as the ability to generate complex structures from simpler elements. In this paper, we…

Machine Learning · Computer Science 2024-08-20 Giovanni Bindi , Philippe Esling

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labelled data. Considering each audio sample as a…

Machine Learning · Computer Science 2022-11-23 Amir Shirian , Krishna Somandepalli , Tanaya Guha

Task-Driven Dictionary Learning

Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience and signal processing. For signals such as natural images that admit such sparse…

Machine Learning · Statistics 2013-09-10 Julien Mairal , Francis Bach , Jean Ponce

Efficiency-oriented approaches for self-supervised speech representation learning

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and…

Computation and Language · Computer Science 2023-12-19 Luis Lugo , Valentin Vielzeuf

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

In this paper, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framework where the model can be updated without relying on labeled data. For this purpose, we propose…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-11 Zhepei Wang , Cem Subakan , Xilin Jiang , Junkai Wu , Efthymios Tzinis , Mirco Ravanelli , Paris Smaragdis

Towards Improved Speech Recognition through Optimized Synthetic Data Generation

Supervised training of speech recognition models requires access to transcribed audio data, which often is not possible due to confidentiality issues. Our approach to this problem is to generate synthetic audio from a text-only corpus using…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-01 Yanis Perrin , Gilles Boulianne

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on…

Sound · Computer Science 2018-09-24 Zixing Zhang , Jürgen Geiger , Jouni Pohjalainen , Amr El-Desoky Mousa , Wenyu Jin , Björn Schuller

Supervised Dictionary Learning

It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of…

Computer Vision and Pattern Recognition · Computer Science 2009-09-29 Julien Mairal , Francis Bach , Jean Ponce , Guillermo Sapiro , Andrew Zisserman

Low-rank Dictionary Learning for Unsupervised Feature Selection

There exist many high-dimensional data in real-world applications such as biology, computer vision, and social networks. Feature selection approaches are devised to confront with high-dimensional data challenges with the aim of efficient…

Machine Learning · Computer Science 2021-06-22 Mohsen Ghassemi Parsa , Hadi Zare , Mehdi Ghatee

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic