Related papers: Feature-informed Embedding Space Regularization Fo…

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Hejung Yang , Hong-Goo Kang

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

Many application studies rely on audio DNN models pre-trained on a large-scale dataset as essential feature extractors, and they extract features from the last layers. In this study, we focus on our finding that the middle layer features of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-18 Daisuke Niizumi , Daiki Takeuchi , Yasunori Ohishi , Noboru Harada , Kunio Kashino

Refining Language Models with Compositional Explanations

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such…

Computation and Language · Computer Science 2022-01-03 Huihan Yao , Ying Chen , Qinyuan Ye , Xisen Jin , Xiang Ren

Feature-informed Latent Space Regularization for Music Source Separation

The integration of additional side information to improve music source separation has been investigated numerous times, e.g., by adding features to the input or by adding learning targets in a multi-task learning scenario. These approaches,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Yun-Ning Hung , Alexander Lerch

Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models

Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very…

Sound · Computer Science 2024-07-02 Alessandro Pianese , Davide Cozzolino , Giovanni Poggi , Luisa Verdoliva

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-21 Changhong Wang , Gaël Richard , Brian McFee

Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization

Training a fine-grained image recognition model with limited data presents a significant challenge, as the subtle differences between categories may not be easily discernible amidst distracting noise patterns. One commonly employed strategy…

Computer Vision and Pattern Recognition · Computer Science 2024-11-27 Avraham Chapman , Haiming Xu , Lingqiao Liu

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-01 Peter Vieting , Maximilian Kannen , Benedikt Hilmes , Ralf Schlüter , Hermann Ney

Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects

In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we…

Machine Learning · Computer Science 2025-01-28 Victor Deng , Changhong Wang , Gael Richard , Brian McFee

Audiovisual transfer learning for audio tagging and sound event detection

We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection. Employing feature fusion, we adapt a baseline system utilizing only spectral acoustic inputs to also make use of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-27 Wim Boes , Hugo Van hamme

Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks

The strength of machine learning models stems from their ability to learn complex function approximations from data; however, this strength also makes training deep neural networks challenging. Notably, the complex models tend to memorize…

Computer Vision and Pattern Recognition · Computer Science 2023-04-17 Mofassir ul Islam Arif , Mohsan Jameel , Josif Grabocka , Lars Schmidt-Thieme

The Surprising Effectiveness of Noise Pretraining for Implicit Neural Representations

The approximation and convergence properties of implicit neural representations (INRs) are known to be highly sensitive to parameter initialization strategies. While several data-driven initialization methods demonstrate significant…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Kushal Vyas , Alper Kayabasi , Daniel Kim , Vishwanath Saragadam , Ashok Veeraraghavan , Guha Balakrishnan

Analysis of Feature Representations for Anomalous Sound Detection

In this work, we thoroughly evaluate the efficacy of pretrained neural networks as feature extractors for anomalous sound detection. In doing so, we leverage the knowledge that is contained in these neural networks to extract semantically…

Sound · Computer Science 2021-02-19 Robert Müller , Steffen Illium , Fabian Ritz , Kyrill Schmid

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

Sound · Computer Science 2022-10-11 Matthew C. McCallum , Filip Korzeniowski , Sergio Oramas , Fabien Gouyon , Andreas F. Ehmann

Adding noise to the input of a model trained with a regularized objective

Regularization is a well studied problem in the context of neural networks. It is usually used to improve the generalization performance when the number of input samples is relatively small or heavily contaminated with noise. The…

Artificial Intelligence · Computer Science 2011-04-19 Salah Rifai , Xavier Glorot , Yoshua Bengio , Pascal Vincent

Feature Normalisation for Robust Speech Recognition

Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This…

Computation and Language · Computer Science 2015-07-16 D. S. Pavan Kumar

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-08 Woo Hyun Kang , Jahangir Alam , Abderrahim Fathan

AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks

The large capacity of neural networks enables them to learn complex functions. To avoid overfitting, networks however require a lot of training data that can be expensive and time-consuming to collect. A common practical approach to…

Machine Learning · Computer Science 2020-03-10 Majed El Helou , Frederike Dümbgen , Sabine Süsstrunk

Online incremental learning for audio classification using a pretrained audio model

Incremental learning aims to learn new tasks sequentially without forgetting the previously learned ones. Most of the existing incremental learning methods for audio focus on training the model from scratch on the initial task, and the same…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-29 Manjunath Mulimani , Annamaria Mesaros

Synthetic Feature Augmentation Improves Generalization Performance of Language Models

Training and fine-tuning deep learning models, especially large language models (LLMs), on limited and imbalanced datasets poses substantial challenges. These issues often result in poor generalization, where models overfit to dominant…

Computation and Language · Computer Science 2025-01-14 Ashok Choudhary , Cornelius Thiels , Hojjat Salehinejad