Sharon Gannot — Scifaro

Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios

We propose a deep beamforming framework for enhancing target speaker(s) in multi-speaker environments. A deep neural network (DNN) is trained to estimate beamforming weights directly from noisy multichannel inputs while satisfying linear…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-21 Ilai Zaidel , Ori Engel , Bar Engel , Sharon Gannot

On the Usefulness of Diffusion-Based Room Impulse Response Interpolation to Microphone Array Processing

Room Impulse Responses estimation is a fundamental problem in spatial audio processing and speech enhancement. In this paper, we build upon our previously introduced diffusion-based inpainting framework for Room Impulse Response…

Sound · Computer Science 2026-03-31 Sagi Della Torre , Mirco Pezzoli , Fabio Antonacci , Sharon Gannot

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

This paper presents a Head-Related Transfer Function (HRTF)-guided framework for binaural Target Speaker Extraction (TSE) from mixtures of concurrent sources. Unlike conventional TSE methods based on Direction of Arrival (DOA) estimation or…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-18 Yoav Ellinson , Sharon Gannot

Speakers Localization Using Batch EM In Unfolding Neural Network

We propose an interpretable Batch-EM Unfolded Network for robust speaker localization. By embedding the iterative EM procedure within an encoder-EM-decoder architecture, the method mitigates initialization sensitivity and improves…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-18 Rina Veler , Sharon Gannot

GDiffuSE: Diffusion-based speech enhancement with noise model guidance

This paper introduces a novel speech enhancement (SE) approach based on a denoising diffusion probabilistic model (DDPM), termed Guided diffusion for speech enhancement (GDiffuSE). In contrast to conventional methods that directly map noisy…

Sound · Computer Science 2026-03-03 Efrayim Yanir , David Burshtein , Sharon Gannot

Binaural Target Speaker Extraction using Individualized HRTF

In this work, we address the problem of binaural target-speaker extraction in the presence of multiple simultane-ous talkers. We propose a novel approach that leverages the individual listener's Head-Related Transfer Function (HRTF) to…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-25 Yoav Ellinson , Sharon Gannot

Interpretable Binaural Deep Beamforming Guided by Time-Varying Relative Transfer Function

In this work, we propose a deep beamforming framework for speech enhancement in dynamic acoustic environments. The framework learns time-varying beamformer weights from noisy multichannel signals via a deep neural network, guided by a…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-18 Ilai Zaidel , Sharon Gannot

Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers

To estimate the direction of arrival (DOA) of multiple speakers with methods that use prototype transfer functions, frequency-dependent spatial spectra (SPS) are usually constructed. To make the DOA estimation robust, SPS from different…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Daniel Fejgin , Elior Hadad , Sharon Gannot , Zbyněk Koldovský , Simon Doclo

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-03 Yochai Yemini , Yoav Ellinson , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

AMDM-SE: Attention-based Multichannel Diffusion Model for Speech Enhancement

Diffusion models have recently achieved impressive results in reconstructing images from noisy inputs, and similar ideas have been applied to speech enhancement by treating time-frequency representations as images. With the ubiquity of…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Renana Opochinsky , Sharon Gannot

Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions

This paper presents a robust multi-channel speaker extraction algorithm designed to handle inaccuracies in reference information. While existing approaches often rely solely on either spatial or spectral cues to identify the target speaker,…

Sound · Computer Science 2025-12-24 Aviad Eisenberg , Sharon Gannot , Shlomo E. Chazan

Socially Pertinent Robots in Gerontological Healthcare

Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several…

Robotics · Computer Science 2025-09-24 Xavier Alameda-Pineda , Angus Addlesee , Daniel Hernández García , Chris Reinke , Soraya Arias , Federica Arrigoni , Alex Auternaud , Lauriane Blavette , Cigdem Beyan , Luis Gomez Camara , Ohad Cohen , Alessandro Conti , Sébastien Dacunha , Christian Dondrup , Yoav Ellinson , Francesco Ferro , Sharon Gannot , Florian Gras , Nancie Gunson , Radu Horaud , Moreno D'Incà , Imad Kimouche , Séverin Lemaignan , Oliver Lemon , Cyril Liotard , Luca Marchionni , Mordehay Moradi , Tomas Pajdla , Maribel Pino , Michal Polic , Matthieu Py , Ariel Rado , Bin Ren , Elisa Ricci , Anne-Sophie Rigaud , Paolo Rota , Marta Romeo , Nicu Sebe , Weronika Sieińska , Pinchas Tandeitnik , Francesco Tonini , Nicolas Turro , Timothée Wintz , Yanchao Yu

(SP)$^2$-Net: A Neural Spatial Spectrum Method for DOA Estimation

We consider the problem of estimating the directions of arrival (DOAs) of multiple sources from a single snapshot of an antenna array, a task with many practical applications. In such settings, the classical Bartlett beamformer is commonly…

Signal Processing · Electrical Eng. & Systems 2025-09-22 Lioz Berman , Sharon Gannot , Tom Tirer

Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior

In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Yochai Yemini , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

Transient Noise Removal via Diffusion-based Speech Inpainting

In this paper, we present PGDI, a diffusion-based speech inpainting framework for restoring missing or severely corrupted speech segments. Unlike previous methods that struggle with speaker variability or long gap lengths, PGDI can…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Mordehay Moradi , Sharon Gannot

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment

This paper presents a Multi-modal Emotion Recognition (MER) system designed to enhance emotion recognition accuracy in challenging acoustic conditions. Our approach combines a modified and extended Hierarchical Token-semantic Audio…

Sound · Computer Science 2025-07-30 Ohad Cohen , Gershon Hazan , Sharon Gannot

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to…

Sound · Computer Science 2025-05-30 Neta Glazer , David Chernin , Idan Achituve , Sharon Gannot , Ethan Fetaya

Video Editing for Audio-Visual Dubbing

Visual dubbing, the synchronization of facial movements with new speech, is crucial for making content accessible across different languages, enabling broader global reach. However, current methods face significant limitations. Existing…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Binyamin Manela , Sharon Gannot , Ethan Fetyaya

DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models

Room Impulse Responses (RIRs) characterize acoustic environments and are crucial in multiple audio signal processing tasks. High-quality RIR estimates drive applications such as virtual microphones, sound source localization, augmented…

Sound · Computer Science 2025-04-30 Sagi Della Torre , Mirco Pezzoli , Fabio Antonacci , Sharon Gannot

End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions

This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative…

Sound · Computer Science 2025-02-11 Aviad Eisenberg , Sharon Gannot , Shlomo E. Chazan