English
Related papers

Related papers: Listenable Maps for Audio Classifiers

200 papers

Neural networks are typically black-boxes that remain opaque with regards to their decision mechanisms. Several works in the literature have proposed post-hoc explanation methods to alleviate this issue. This paper proposes LMAC-TD, a…

Sound · Computer Science 2024-09-16 Eleonora Mancini , Francesco Paissan , Mirco Ravanelli , Cem Subakan

This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio…

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal…

Sound · Computer Science 2024-06-04 Josh Gardner , Simon Durand , Daniel Stoller , Rachel M. Bittner

This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel…

Interpretability is essential for user trust in real-world anomaly detection applications. However, deep learning models, despite their strong performance, often lack transparency. In this work, we study the interpretability of…

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio…

Audio agents extend large audio-language models (LALMs) by decomposing audio questions into tool calls, intermediate evidence, and iterative reasoning steps. However, as LALMs become stronger, the key challenge shifts from enabling tool use…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-28 Yucheng Wang , Jing Peng , Hanqi Li , Chenghao Wang , Wenming Tu , Yu Xi , Zhaokai Sun , Kai Yu , Shuai Wang

Interpretability is highly desired for deep neural network-based classifiers, especially when addressing high-stake decisions in medical imaging. Commonly used post-hoc interpretability methods have the limitation that they can produce…

Image and Video Processing · Electrical Eng. & Systems 2024-01-04 Sourya Sengupta , Mark A. Anastasio

Advancements in audio neural networks have established state-of-the-art results on downstream audio tasks. However, the black-box structure of these models makes it difficult to interpret the information encoded in their internal audio…

Sound · Computer Science 2025-04-22 Alice Zhang , Edison Thomaz , Lie Lu

Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate the likeness of voices and faces following contrastive…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Chong Peng , Liqiang He , Dan Su

While pre-trained multimodal representations (e.g., CLIP) have shown impressive capabilities, they exhibit significant compositional vulnerabilities leading to counterintuitive judgments. We introduce Multimodal Adversarial Compositionality…

Computation and Language · Computer Science 2025-05-30 Jaewoo Ahn , Heeseung Yun , Dayoon Ko , Gunhee Kim

Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract…

Sound · Computer Science 2025-06-05 Aurian Quelennec , Pierre Chouteau , Geoffroy Peeters , Slim Essid

Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key…

Machine Learning · Computer Science 2020-08-27 Darius Afchar , Romain Hennequin

Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study presents an explainable multimodal deep learning…

Sound · Computer Science 2025-12-02 S M Asiful Islam Saky , Md Rashidul Islam , Md Saiful Arefin , Shahaba Alam

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by…

Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment. However, current interpretability methods often face challenges such as low resolution and high…

Computation and Language · Computer Science 2025-10-14 Tian Lan , Jinyuan Xu , Xue He , Jenq-Neng Hwang , Lei Li

While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…

Sound · Computer Science 2024-09-16 Ruolan Leslie Famularo , Dmitry N. Zotkin , Shihab A. Shamma , Ramani Duraiswami

The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust. While there has been extensive exploration of interpretation techniques in…

Sound · Computer Science 2024-02-07 Luca Della Libera , Cem Subakan , Mirco Ravanelli

The lack of interpretability of the Vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called VISION DIFFMASK, which…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Angelos Nalmpantis , Apostolos Panagiotopoulos , John Gkountouras , Konstantinos Papakostas , Wilker Aziz
‹ Prev 1 2 3 10 Next ›