Related papers: VarArray: Array-Geometry-Agnostic Continuous Speec…

UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation

Array-geometry-agnostic speech separation (AGA-SS) aims to develop an effective separation method regardless of the microphone array geometry. Conventional methods rely on permutation-free operations, such as summation or attention…

Sound · Computer Science 2025-03-10 Weiguang Chen , Junjie Zhang , Jielong Yang , Eng Siong Chng , Xionghu Zhong

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as…

Sound · Computer Science 2021-03-04 Dongmei Wang , Takuya Yoshioka , Zhuo Chen , Xiaofei Wang , Tianyan Zhou , Zhong Meng

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-22 Ju Lin , Niko Moritz , Yiteng Huang , Ruiming Xie , Ming Sun , Christian Fuegen , Frank Seide

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-05 Naoyuki Kanda , Jian Wu , Xiaofei Wang , Zhuo Chen , Jinyu Li , Takuya Yoshioka

Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-12 Ryan M. Corey , Andrew C. Singer

Microphone Array Generalization for Multichannel Narrowband Deep Speech Enhancement

This paper addresses the problem of microphone array generalization for deep-learning-based end-to-end multichannel speech enhancement. We aim to train a unique deep neural network (DNN) potentially performing well on unseen microphone…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-28 Siyuan Zhang , Xiaofei Li

AmbiDrop: Array-Agnostic Speech Enhancement Using Ambisonics Encoding and Dropout-Based Learning

Multichannel speech enhancement leverages spatial cues to improve intelligibility and quality, but most learning-based methods rely on specific microphone array geometry, unable to account for geometry changes. To mitigate this limitation,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Michael Tatarjitzky , Boaz Rafaely

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-22 Hassan Taherian , Sefik Emre Eskimez , Takuya Yoshioka , Huaming Wang , Zhuo Chen , Xuedong Huang

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-12 Takuya Yoshioka , Igor Abramovski , Cem Aksoylar , Zhuo Chen , Moshe David , Dimitrios Dimitriadis , Yifan Gong , Ilya Gurvich , Xuedong Huang , Yan Huang , Aviv Hurvitz , Li Jiang , Sharon Koubi , Eyal Krupka , Ido Leichter , Changliang Liu , Partha Parthasarathy , Alon Vinnikov , Lingfeng Wu , Xiong Xiao , Wayne Xiong , Huaming Wang , Zhenghao Wang , Jun Zhang , Yong Zhao , Tianyan Zhou

Scene-Agnostic Multi-Microphone Speech Dereverberation

Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-14 Yochai Yemini , Ethan Fetaya , Haggai Maron , Sharon Gannot

Exploring Efficient Directional and Distance Cues for Regional Speech Separation

In this paper, we introduce a neural network-based method for regional speech separation using a microphone array. This approach leverages novel spatial cues to extract the sound source not only from specified direction but also within…

Sound · Computer Science 2025-08-12 Yiheng Jiang , Haoxu Wang , Yafeng Chen , Gang Qiao , Biao Tian

Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully}…

Sound · Computer Science 2020-05-08 Zhuo Chen , Takuya Yoshioka , Liang Lu , Tianyan Zhou , Zhong Meng , Yi Luo , Jian Wu , Xiong Xiao , Jinyu Li

Multi-channel Conversational Speaker Separation via Neural Diarization

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

Audio-visual Multi-channel Recognition of Overlapped Speech

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-19 Jianwei Yu , Bo Wu , Rongzhi Gu , Shi-Xiong Zhang , Lianwu Chen , Yong Xu. Meng Yu , Dan Su , Dong Yu , Xunying Liu , Helen Meng

Online speaker diarization of meetings guided by speech separation

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-02 Elio Gruttadauria , Mathieu Fontaine , Slim Essid

A robust and passive method for geometric calibration of large arrays

This paper presents a complete strategy for the geometry estimation of large microphone arrays of arbitrary shape. Largeness is intended here in both number of microphones (hundreds) and size (few meters). Such arrays can be used for…

Data Analysis, Statistics and Probability · Physics 2016-03-28 Charles Vanwynsberghe , Pascal Challande , Jacques Marchal , Régis Marchiano , François Ollivier

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Existing multi-channel continuous speech separation (CSS) models are heavily dependent on supervised data - either simulated data which causes data mismatch between the training and real-data testing, or the real transcribed overlapping…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-08 Xiaofei Wang , Dongmei Wang , Naoyuki Kanda , Sefik Emre Eskimez , Takuya Yoshioka

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices from a continuous influx of conversational audio containing an unknown number of utterances spoken by an unknown number of speakers. A common application scenario is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-14 Zhuohuang Zhang , Takuya Yoshioka , Naoyuki Kanda , Zhuo Chen , Xiaofei Wang , Dongmei Wang , Sefik Emre Eskimez

Meeting Transcription Using Virtual Microphone Arrays

We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-09 Takuya Yoshioka , Zhuo Chen , Dimitrios Dimitriadis , William Hinthorn , Xuedong Huang , Andreas Stolcke , Michael Zeng