English
Related papers

Related papers: Multi-agent Auditory Scene Analysis

200 papers

We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-16 Dongheon Lee , Younghoo Kwon , Jung-Woo Choi

Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional…

Sound · Computer Science 2021-02-24 Noriyuki Tonami , Keisuke Imoto , Ryosuke Yamanishi , Yoichi Yamashita

Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however,…

In this article we present an account of the state-of-the-art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we…

Sound · Computer Science 2015-04-08 Daniele Barchiesi , Dimitrios Giannoulis , Dan Stowell , Mark D. Plumbley

A multi-agent AI system (MAS) is composed of multiple autonomous agents that interact, exchange information, and make decisions based on internal generative models. Recent advances in large language models and tool-using agents have made…

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Otavio Braga , Olivier Siohan

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Sagnik Majumder , Ziad Al-Halah , Kristen Grauman

During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not…

Sound · Computer Science 2022-07-22 Wei Sun , Mei Wang , Lili Qiu

Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene…

Sound · Computer Science 2019-12-24 Ivo Trowitzsch , Christopher Schymura , Dorothea Kolossa , Klaus Obermayer

This study describes a binaural machine hearing system that is capable of performing auditory stream segregation in scenarios where multiple sound sources are present. The process of stream segregation refers to the capability of human…

Sound · Computer Science 2016-06-27 Christopher Schymura , Thomas Walther , Dorothea Kolossa

Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns. For example, a cooking scene may contain several sound sources including silverware clinking,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-09-20 Weimin Wang , Weiran Wang , Ming Sun , Chao Wang

In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as…

Sound · Computer Science 2020-05-27 Dat Ngo , Hao Hoang , Anh Nguyen , Tien Ly , Lam Pham

Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-03 Xubo Liu , Qiuqiang Kong , Yan Zhao , Haohe Liu , Yi Yuan , Yuzhuo Liu , Rui Xia , Yuxuan Wang , Mark D. Plumbley , Wenwu Wang

Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling…

Computer Vision and Pattern Recognition · Computer Science 2021-10-06 Juan León-Alcázar , Fabian Caba Heilbron , Ali Thabet , Bernard Ghanem

Recent progress in auditory intelligence has yielded high-performing systems for sound event detection (SED), acoustic scene classification (ASC), automated audio captioning (AAC), and audio question answering (AQA). Yet these tasks remain…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-12 Hyeonuk Nam

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global…

Sound · Computer Science 2019-10-23 Hongwei Song , Jiqing Han , Shiwen Deng , Zhihao Du

The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to…

Computation and Language · Computer Science 2013-03-25 Urmila Shrawankar , VM Thakare

Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking…

Cryptography and Security · Computer Science 2025-04-08 Zheng Fang , Shenyi Zhang , Tao Wang , Bowen Li , Lingchen Zhao , Zhangyi Wang

Purpose: Surgical scene understanding is key to advancing computer-aided and intelligent surgical systems. Current approaches predominantly rely on visual data or end-to-end learning, which limits fine-grained contextual modeling. This work…

New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Lokesh Bansal , S. Pavankumar Dubagunta , Malolan Chetlur , Pushpak Jagtap , Aravind Ganapathiraju
‹ Prev 1 2 3 10 Next ›