English
Related papers

Related papers: VarArray: Array-Geometry-Agnostic Continuous Speec…

200 papers

Array-geometry-agnostic speech separation (AGA-SS) aims to develop an effective separation method regardless of the microphone array geometry. Conventional methods rely on permutation-free operations, such as summation or attention…

Sound · Computer Science 2025-03-10 Weiguang Chen , Junjie Zhang , Jielong Yang , Eng Siong Chng , Xionghu Zhong

Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as…

Sound · Computer Science 2021-03-04 Dongmei Wang , Takuya Yoshioka , Zhuo Chen , Xiaofei Wang , Tianyan Zhou , Zhong Meng

Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-22 Ju Lin , Niko Moritz , Yiteng Huang , Ruiming Xie , Ming Sun , Christian Fuegen , Frank Seide

This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-05 Naoyuki Kanda , Jian Wu , Xiaofei Wang , Zhuo Chen , Jinyu Li , Takuya Yoshioka

We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-12 Ryan M. Corey , Andrew C. Singer

This paper addresses the problem of microphone array generalization for deep-learning-based end-to-end multichannel speech enhancement. We aim to train a unique deep neural network (DNN) potentially performing well on unseen microphone…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-28 Siyuan Zhang , Xiaofei Li

Multichannel speech enhancement leverages spatial cues to improve intelligibility and quality, but most learning-based methods rely on specific microphone array geometry, unable to account for geometry changes. To mitigate this limitation,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Michael Tatarjitzky , Boaz Rafaely

With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-22 Hassan Taherian , Sefik Emre Eskimez , Takuya Yoshioka , Huaming Wang , Zhuo Chen , Xuedong Huang

This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved…

Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-14 Yochai Yemini , Ethan Fetaya , Haggai Maron , Sharon Gannot

In this paper, we introduce a neural network-based method for regional speech separation using a microphone array. This approach leverages novel spatial cues to extract the sound source not only from specified direction but also within…

Sound · Computer Science 2025-08-12 Yiheng Jiang , Haoxu Wang , Yafeng Chen , Gang Qiao , Biao Tian

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully}…

Sound · Computer Science 2020-05-08 Zhuo Chen , Takuya Yoshioka , Liang Lu , Tianyan Zhou , Zhong Meng , Yi Luo , Jian Wu , Xiong Xiao , Jinyu Li

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-19 Jianwei Yu , Bo Wu , Rongzhi Gu , Shi-Xiong Zhang , Lianwu Chen , Yong Xu. Meng Yu , Dan Su , Dong Yu , Xunying Liu , Helen Meng

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-02 Elio Gruttadauria , Mathieu Fontaine , Slim Essid

This paper presents a complete strategy for the geometry estimation of large microphone arrays of arbitrary shape. Largeness is intended here in both number of microphones (hundreds) and size (few meters). Such arrays can be used for…

Data Analysis, Statistics and Probability · Physics 2016-03-28 Charles Vanwynsberghe , Pascal Challande , Jacques Marchal , Régis Marchiano , François Ollivier

Existing multi-channel continuous speech separation (CSS) models are heavily dependent on supervised data - either simulated data which causes data mismatch between the training and real-data testing, or the real transcribed overlapping…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-08 Xiaofei Wang , Dongmei Wang , Naoyuki Kanda , Sefik Emre Eskimez , Takuya Yoshioka

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-23 Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

Continuous speech separation (CSS) aims to separate overlapping voices from a continuous influx of conversational audio containing an unknown number of utterances spoken by an unknown number of speakers. A common application scenario is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-14 Zhuohuang Zhang , Takuya Yoshioka , Naoyuki Kanda , Zhuo Chen , Xiaofei Wang , Dongmei Wang , Sefik Emre Eskimez

We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-09 Takuya Yoshioka , Zhuo Chen , Dimitrios Dimitriadis , William Hinthorn , Xuedong Huang , Andreas Stolcke , Michael Zeng
‹ Prev 1 2 3 10 Next ›