Related papers: Audio Spotforming Using Nonnegative Tensor Factori…

Nonnegative Tensor Factorization for Directional Blind Audio Source Separation

We augment the nonnegative matrix factorization method for audio source separation with cues about directionality of sound propagation. This improves separation quality greatly and removes the need for training data, with only a twofold…

Machine Learning · Statistics 2017-05-30 Noah D. Stein

Neural Directional Filtering Using a Compact Microphone Array

Beamforming with desired directivity patterns using compact microphone arrays is essential in many audio applications. Directivity patterns achievable using traditional beamformers depend on the number of microphones and the array aperture.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-24 Weilong Huang , Srikanth Raj Chetupalli , Mhd Modar Halimeh , Oliver Thiergart , Emanuël A. P. Habets

Nonnegative Matrix Factorization with Transform Learning

Traditional NMF-based signal decomposition relies on the factorization of spectral data, which is typically computed by means of short-time frequency transform. In this paper we propose to relax the choice of a pre-fixed transform and learn…

Machine Learning · Computer Science 2017-12-18 Dylan Fagot , Cédric Févotte , Herwig Wendt

Non-negative matrix factorization-based subband decomposition for acoustic source localization

A novel non-negative matrix factorization (NMF) based subband decomposition in frequency spatial domain for acoustic source localization using a microphone array is introduced. The proposed method decomposes source and noise subband and…

Sound · Computer Science 2016-10-18 Suwon Shon , Seongkyu Mun , David Han , Hanseok Ko

Non-negative Matrix Factorization with Linear Constraints for Single-Channel Speech Enhancement

This paper investigates a non-negative matrix factorization (NMF)-based approach to the semi-supervised single-channel speech enhancement problem where only non-stationary additive noise signals are given. The proposed method relies on…

Sound · Computer Science 2013-09-25 Nikolay Lyubimov , Mikhail Kotov

AvaTr: One-Shot Speaker Extraction with Transformers

To extract the voice of a target speaker when mixed with a variety of other sounds, such as white and ambient noises or the voices of interfering speakers, we extend the Transformer network to attend the most relevant information with…

Sound · Computer Science 2021-05-04 Shell Xu Hu , Md Rifat Arefin , Viet-Nhat Nguyen , Alish Dipani , Xaq Pitkow , Andreas Savas Tolias

Non-negative tensor factorization for vibration-based local damage detection

In this study, a novel non-negative tensor factorization (NTF)-based method for vibration-based local damage detection in rolling element bearings is proposed. As the diagnostic signal registered from a faulty machine is non-stationary, the…

Signal Processing · Electrical Eng. & Systems 2024-03-20 Mateusz Gabor , Rafal Zdunek , Radoslaw Zimroz , Jacek Wodecki , Agnieszka Wylomanska

Deep Tensor Factorization for Spatially-Aware Scene Decomposition

We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone. To this end, we…

Sound · Computer Science 2019-09-30 Jonah Casebeer , Michael Colomb , Paris Smaragdis

A Complex Matrix Factorization approach to Joint Modeling of Magnitude and Phase for Source Separation

Conventional NMF methods for source separation factorize the matrix of spectral magnitudes. Spectral Phase is not included in the decomposition process of these methods. However, phase of the speech mixture is generally used in…

Sound · Computer Science 2014-11-26 Chaitanya Ahuja , Karan Nathwani , Rajesh M. Hegde

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-10 Krishna Subramani , Paris Smaragdis , Takuya Higuchi , Mehrez Souden

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's voice from mixture signals using a pre-recorded enrollment utterance that characterize the voice characteristics of the target speaker. One major difficulty of target…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-17 Hiroshi Sato , Tsubasa Ochiai , Marc Delcroix , Keisuke Kinoshita , Takafumi Moriya , Naoki Makishima , Mana Ihori , Tomohiro Tanaka , Ryo Masumura

Time-Domain Based Embeddings for Spoofed Audio Representation

Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Matan Karo , Arie Yeredor , Itshak Lapidot

Binaural Target Speaker Extraction using Individualized HRTF

In this work, we address the problem of binaural target-speaker extraction in the presence of multiple simultane-ous talkers. We propose a novel approach that leverages the individual listener's Head-Related Transfer Function (HRTF) to…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-25 Yoav Ellinson , Sharon Gannot

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-01 Peter Vieting , Maximilian Kannen , Benedikt Hilmes , Ralf Schlüter , Hermann Ney

Stratified Non-Negative Tensor Factorization

Non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) decompose non-negative high-dimensional data into non-negative low-rank components. NMF and NTF methods are popular for their intrinsic interpretability and…

Machine Learning · Computer Science 2024-12-02 Alexander Sietsema , Zerrin Vural , James Chapman , Yotam Yaniv , Deanna Needell

Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme

Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Localization is frequently accomplished by ``beamforming'', which combines phase-shifted audio streams to increase power…

Sound · Computer Science 2025-02-13 Saeid Haghighatshoar , Dylan R Muir

Combination of Subtractive Clustering and Radial Basis Function in Speaker Identification

Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker identification required to make a claim on the identity of speaker from the Ns trained speaker in its user database. In this…

Multimedia · Computer Science 2010-04-27 Ibrahim A. Albidewi , Yap Teck Ann

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually…

Sound · Computer Science 2024-06-06 Theo Mariotte , Anthony Larcher , Silvio Montresor , Jean-Hugh Thomas

Nonnegative tensor factorization with frequency modulation cues for blind audio source separation

We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our…

Sound · Computer Science 2016-06-02 Elliot Creager , Noah D. Stein , Roland Badeau , Philippe Depalle

Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement

Recently, attention-based transformers have become a de facto standard in many deep learning applications including natural language processing, computer vision, signal processing, etc.. In this paper, we propose a transformer-based…

Sound · Computer Science 2024-09-04 Tathagata Bandyopadhyay