Related papers: Microphone Array Based Surveillance Audio Classifi…
Due to the growing demand for improving surveillance capabilities in smart cities, systems need to be developed to provide better monitoring capabilities to competent authorities, agencies responsible for strategic resource management, and…
This paper proposes sound event localization and detection methods from multichannel recording. The proposed system is based on two Convolutional Recurrent Neural Networks (CRNNs) to perform sound event detection (SED) and time difference…
Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single…
Asynchronous microphone array calibration is a prerequisite for many audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Mapping (SLAM), using the time…
Sensor nodes in a wireless sensor network (WSN) for security surveillance applications should preferably be small, energy-efficient, and inexpensive with in-sensor computational abilities. An appropriate data processing scheme in the sensor…
Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still…
Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy…
Sleep-disordered breathing (SDB) is a serious and prevalent condition, and acoustic analysis via consumer devices (e.g. smartphones) offers a low-cost solution to screening for it. We present a novel approach for the acoustic identification…
Environmental sound detection is a challenging application of machine learning because of the noisy nature of the signal, and the small amount of (labeled) data that is typically available. This work thus presents a comparison of several…
The frequent breakdowns and malfunctions of industrial equipment have driven increasing interest in utilizing cost-effective and easy-to-deploy sensors, such as microphones, for effective condition monitoring of machinery. Microphones offer…
We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker…
Sound event detection (SED) is the task of identifying sound events along with their onset and offset times. A recent, convolutional neural networks based SED method, proposed the usage of depthwise separable (DWS) and time-dilated…
Deepfake audio detection has progressed rapidly with strong pre-trained encoders (e.g., WavLM, Wav2Vec2, MMS). However, performance in realistic capture conditions - background noise (domestic/office/transport), room reverberation, and…
Phased microphone arrays are used widely in the applications for acoustic source localization. Deconvolution approaches such as DAMAS successfully overcome the spatial resolution limit of the conventional delay-and-sum (DAS) beamforming…
Since space-domain information can be utilized, microphone array beamforming is often used to enhance the quality of the speech by suppressing directional disturbance. However, with the increasing number of microphone, the complexity would…
This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these…
Delay-and-sum (DAS) algorithms are widely used for beamforming in linear array photoacoustic imaging systems and are characterized by fast execution. However, these algorithms suffer from various drawbacks like low resolution, low contrast,…
Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. This paper presents a multi-grained based attention network (MGA-Net) for semi-supervised sound event…
An approach to the estimation of the Direction of Arrival (DOA) of wide-band signals with a planar microphone array is presented. Our algorithm estimates an unambiguous DOA using a single planar array in which the microphones are placed…
In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD…