Related papers: Audio Impairment Recognition Using a Correlation-B…

Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach

Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI)…

Machine Learning · Computer Science 2020-10-14 Benjamin Eyre , Aparna Balagopalan , Jekaterina Novikova

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these "shallow" architectures, feature engineering and learning…

Sound · Computer Science 2015-11-18 Peter Li , Jiyuan Qian , Tian Wang

A Methodology for Exploring Deep Convolutional Features in Relation to Hand-Crafted Features with an Application to Music Audio Modeling

Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on…

Sound · Computer Science 2021-10-12 Anna K. Yanchenko , Mohammadreza Soltani , Robert J. Ravier , Sayan Mukherjee , Vahid Tarokh

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment…

Sound · Computer Science 2023-01-27 Jianwei Zhang , Julie Liss , Suren Jayasuriya , Visar Berisha

Diverse Audio Embeddings -- Bringing Features Back Outperforms CLAP!

With the advent of modern AI architectures, a shift has happened towards end-to-end architectures. This pivot has led to neural architectures being trained without domain-specific biases/knowledge, optimized according to the task. We in…

Sound · Computer Science 2025-05-08 Prateek Verma

Audio representations for deep learning in sound synthesis: A review

The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and…

Sound · Computer Science 2022-01-10 Anastasia Natsiou , Sean O'Leary

Degradation-Invariant Music Indexing

For music indexing robust to sound degradations and scalable for big music catalogs, this scientific report presents an approach based on audio descriptors relevant to the music content and invariant to sound transformations (noise…

Signal Processing · Electrical Eng. & Systems 2024-03-04 Rémi Mignot , Geoffroy Peeters

Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment, an important subtask of music information retrieval. We present a novel method to detect…

Sound · Computer Science 2021-02-16 Ruchit Agrawal , Daniel Wolff , Simon Dixon

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features. For achieving high performance, DNNs often need a large amount of annotated data which can be difficult and…

Machine Learning · Computer Science 2020-07-09 Xavier Favory , Konstantinos Drossos , Tuomas Virtanen , Xavier Serra

Compositional Audio Representation Learning

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

An emerging trend in audio processing is capturing low-level speech representations from raw waveforms. These representations have shown promising results on a variety of tasks, such as speech recognition and speech separation. Compared to…

Sound · Computer Science 2021-09-08 Zhongwei Teng , Quchen Fu , Jules White , Maria Powell , Douglas C. Schmidt

An efficient supervised dictionary learning method for audio signal recognition

Machine hearing or listening represents an emerging area. Conventional approaches rely on the design of handcrafted features specialized to a specific audio task and that can hardly generalized to other audio fields. For example,…

Computer Vision and Pattern Recognition · Computer Science 2018-12-13 Imad Rida , Romain Hérault , Gilles Gasso

Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification

Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often…

Sound · Computer Science 2025-07-01 R. Oguz Araz , Guillem Cortès-Sebastià , Emilio Molina , Joan Serrà , Xavier Serra , Yuki Mitsufuji , Dmitry Bogdanov

Embedded Emotions -- A Data Driven Approach to Learn Transferable Feature Representations from Raw Speech Input for Emotion Recognition

Traditional approaches to automatic emotion recognition are relying on the application of handcrafted features. More recently however the advent of deep learning enabled algorithms to learn meaningful representations of input data…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-01 Dominik Schiller , Silvan Mertes , Elisabeth André

Towards Improved Objective Perceptual Audio Quality Assessment -- Part 1: A Novel Data-Driven Cognitive Model

Efficient audio quality assessment is vital for streamlining audio codec development. Objective assessment tools have been developed over time to algorithmically predict quality ratings from subjective assessments, the gold standard for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Pablo M. Delgado , Jürgen Herre

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright. Most current approaches to plagiarism detection are based on…

Sound · Computer Science 2016-06-08 Soham De , Indradyumna Roy , Tarunima Prabhakar , Kriti Suneja , Sourish Chaudhuri , Rita Singh , Bhiksha Raj

Towards Context-Aware Neural Performance-Score Synchronisation

Music can be represented in multiple forms, such as in the audio form as a recording of a performance, in the symbolic form as a computer readable score, or in the image form as a scan of the sheet music. Music synchronisation provides a…

Sound · Computer Science 2022-06-02 Ruchit Agrawal

Machine Learning Framework for Audio-Based Equipment Condition Monitoring: A Comparative Study of Classification Algorithms

Audio-based equipment condition monitoring suffers from a lack of standardized methodologies for algorithm selection, hindering reproducible research. This paper addresses this gap by introducing a comprehensive framework for the systematic…

Machine Learning · Computer Science 2026-03-20 Srijesh Pillai , Yodhin Agarwal , Zaheeruddin Ahmed

Audio Recording Device Identification Based on Deep Learning

In this paper we present a research on identification of audio recording devices from background noise, thus providing a method for forensics. The audio signal is the sum of speech signal and noise signal. Usually, people pay more attention…

Sound · Computer Science 2016-04-28 Simeng Qi , Zheng Huang , Yan Li , Shaopei Shi