Related papers: Multi-agent Auditory Scene Analysis

DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis

We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-16 Dongheon Lee , Younghoo Kwon , Jung-Woo Choi

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional…

Sound · Computer Science 2021-02-24 Noriyuki Tonami , Keisuke Imoto , Ryosuke Yamanishi , Yoichi Yamashita

Joint Analysis of Acoustic Events and Scenes Based on Multitask Learning

Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however,…

Sound · Computer Science 2019-07-22 Noriyuki Tonami , Keisuke Imoto , Masahiro Niitsuma , Ryosuke Yamanishi , Yoichi Yamashita

Acoustic Scene Classification

In this article we present an account of the state-of-the-art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we…

Sound · Computer Science 2015-04-08 Daniele Barchiesi , Dimitrios Giannoulis , Dan Stowell , Mark D. Plumbley

An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems

A multi-agent AI system (MAS) is composed of multiple autonomous agents that interact, exchange information, and make decisions based on internal generative models. Recent advances in large language models and tool-using agents have made…

Multiagent Systems · Computer Science 2025-08-26 Fangqiao Tian , An Luo , Jin Du , Xun Xian , Robert Specht , Ganghua Wang , Xuan Bi , Jiawei Zhou , Ashish Kundu , Jayanth Srinivasa , Charles Fleming , Rui Zhang , Zirui Liu , Mingyi Hong , Jie Ding

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Otavio Braga , Olivier Siohan

Move2Hear: Active Audio-Visual Source Separation

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Sagnik Majumder , Ziad Al-Halah , Kristen Grauman

Spatial Aware Multi-Task Learning Based Speech Separation

During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not…

Sound · Computer Science 2022-07-22 Wei Sun , Mei Wang , Lili Qiu

Joining Sound Event Detection and Localization Through Spatial Segregation

Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene…

Sound · Computer Science 2019-12-24 Ivo Trowitzsch , Christopher Schymura , Dorothea Kolossa , Klaus Obermayer

An Active Machine Hearing System for Auditory Stream Segregation

This study describes a binaural machine hearing system that is capable of performing auditory stream segregation in scenarios where multiple sound sources are present. The process of stream segregation refers to the capability of human…

Sound · Computer Science 2016-06-27 Christopher Schymura , Thomas Walther , Dorothea Kolossa

Acoustic scene analysis with multi-head attention networks

Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns. For example, a cooking scene may contain several sound sources including silverware clinking,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-09-20 Weimin Wang , Weiran Wang , Ming Sun , Chao Wang

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as…

Sound · Computer Science 2020-05-27 Dat Ngo , Hao Hoang , Anh Nguyen , Tien Ly , Lam Pham

Separate Anything You Describe

Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-03 Xubo Liu , Qiuqiang Kong , Yan Zhao , Haohe Liu , Yi Yuan , Yuzhuo Liu , Rui Xia , Yuxuan Wang , Mark D. Plumbley , Wenwu Wang

MAAS: Multi-modal Assignation for Active Speaker Detection

Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling…

Computer Vision and Pattern Recognition · Computer Science 2021-10-06 Juan León-Alcázar , Fabian Caba Heilbron , Ali Thabet , Bernard Ghanem

Auditory Intelligence: Understanding the World Through Sound

Recent progress in auditory intelligence has yielded high-performing systems for sound event detection (SED), acoustic scene classification (ASC), automated audio captioning (AAC), and audio question answering (AQA). Yet these tasks remain…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-12 Hyeonuk Nam

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global…

Sound · Computer Science 2019-10-23 Hongwei Song , Jiqing Han , Shiwen Deng , Zhihao Du

Adverse Conditions and ASR Techniques for Robust Speech User Interface

The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to…

Computation and Language · Computer Science 2013-03-25 Urmila Shrawankar , VM Thakare

Selective Masking Adversarial Attack on Automatic Speech Recognition Systems

Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking…

Cryptography and Security · Computer Science 2025-04-08 Zheng Fang , Shenyi Zhang , Tao Wang , Bowen Li , Lingchen Zhao , Zhangyi Wang

Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes

Purpose: Surgical scene understanding is key to advancing computer-aided and intelligent surgical systems. Current approaches predominantly rely on visual data or end-to-end learning, which limits fine-grained contextual modeling. This work…

Sound · Computer Science 2026-05-05 Jonas Hein , Lazaros Vlachopoulos , Maurits Geert Laurent Olthof , Bastian Sigrist , Philipp Fürnstahl , Matthias Seibold

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Lokesh Bansal , S. Pavankumar Dubagunta , Malolan Chetlur , Pushpak Jagtap , Aravind Ganapathiraju