English
Related papers

Related papers: An Explainable Proxy Model for Multiabel Audio Seg…

200 papers

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Martin Lebourdais , Théo Mariotte , Antonio Almudévar , Marie Tahon , Alfonso Ortega

Explainable AI (XAI) is commonly applied to anomalous sound detection (ASD) models to identify which time-frequency regions of an audio signal contribute to an anomaly decision. However, most audio explanations rely on qualitative…

Sound · Computer Science 2026-01-28 Alexander Buck , Georgina Cosma , Iain Phillips , Paul Conway , Patrick Baker

Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-23 Sina Alisamir , Fabien Ringeval , Francois Portet

Explainable Artificial Intelligence (XAI) is targeted at understanding how models perform feature selection and derive their classification decisions. This paper explores post-hoc explanations for deep neural networks in the audio domain.…

Anomalous sound detection (ASD) typically involves self-supervised proxy tasks to learn feature representations from normal sound data, owing to the scarcity of anomalous samples. In ASD research, proxy tasks such as AutoEncoders operate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-14 Seunghyeon Shin , Seokjin Lee

Speaker segmentation consists in partitioning a conversation between one or more speakers into speaker turns. Usually addressed as the late combination of three sub-tasks (voice activity detection, speaker change detection, and overlapped…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-11 Hervé Bredin , Antoine Laurent

Explainable Artificial Intelligence (XAI) has emerged as a critical tool for interpreting the predictions of complex deep learning models. While XAI has been increasingly applied in various domains within acoustics, its use in bioacoustics,…

Sound · Computer Science 2025-09-11 Zubair Faruqui , Mackenzie S. McIntire , Rahul Dubey , Jay McEntee

The range of potential applications of acoustic analysis is wide. Classification of sounds, in particular, is a typical machine learning task that received a lot of attention in recent years. The most common approaches to sound…

Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown…

While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…

Sound · Computer Science 2024-09-16 Ruolan Leslie Famularo , Dmitry N. Zotkin , Shihab A. Shamma , Ramani Duraiswami

eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box AI models. The significance of XAI spans…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Hossein Shreim , Abdul Karim Gizzini , Ali J. Ghandour

Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a…

Sound · Computer Science 2025-05-30 Zhaokai Sun , Li Zhang , Qing Wang , Pan Zhou , Lei Xie

Music segmentation refers to the dual problem of identifying boundaries between, and labeling, distinct music segments, e.g., the chorus, verse, bridge etc. in popular music. The performance of a range of music segmentation algorithms has…

Sound · Computer Science 2021-08-31 Matthew C. McCallum

Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal. Existing SLR models are either too computationally expensive or too large to run effectively on devices with limited…

Computation and Language · Computer Science 2023-06-06 Oriol Nieto , Zeyu Jin , Franck Dernoncourt , Justin Salamon

Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with…

Sound · Computer Science 2022-04-01 Zhihao Du , Shiliang Zhang , Siqi Zheng , Zhijie Yan

Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language…

Computation and Language · Computer Science 2023-09-15 Eliana Pastor , Alkis Koudounas , Giuseppe Attanasio , Dirk Hovy , Elena Baralis

General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation models are either domain-specific, designed for fixed…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-24 Bowen Shi , Andros Tjandra , John Hoffman , Helin Wang , Yi-Chiao Wu , Luya Gao , Julius Richter , Matt Le , Apoorv Vyas , Sanyuan Chen , Christoph Feichtenhofer , Piotr Dollár , Wei-Ning Hsu , Ann Lee

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that…

Sound · Computer Science 2025-10-08 Christopher Mitcheltree , Hao Hao Tan , Joshua D. Reiss

Speech Emotion Recognition (SER) is typically trained and evaluated on majority-voted labels, which simplifies benchmarking but masks subjectivity and provides little transparency into why predictions are made. This neglects valid minority…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-06 Bo-Hao Su , Hui-Ying Shih , Jinchuan Tian , Jiatong Shi , Chi-Chun Lee , Carlos Busso , Shinji Watanabe

We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of…

Machine Learning · Computer Science 2023-06-13 Kai Jia , Pasapol Saowakon , Limor Appelbaum , Martin Rinard
‹ Prev 1 2 3 10 Next ›