English
Related papers

Related papers: Device-directed Utterance Detection

200 papers

User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-06 Kellen Gillespie , Ioannis C. Konstantakopoulos , Xingzhi Guo , Vishal Thanvantri Vasudevan , Abhinav Sethy

Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-06 Ognjen , Rudovic , Pranay Dighe , Yi Su , Vineet Garg , Sameer Dharur , Xiaochuan Niu , Ahmed H. Abdelaziz , Saurabh Adya , Ahmed Tewfik

Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he or she wants the VA…

Human-Computer Interaction · Computer Science 2019-02-05 Atta Norouzian , Bogdan Mazoure , Dermot Connolly , Daniel Willett

In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-21 Xiaosu Tong , Che-Wei Huang , Sri Harish Mallidi , Shaun Joseph , Sonal Pareek , Chander Chandak , Ariya Rastrow , Roland Maas

The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable.…

Computation and Language · Computer Science 2020-11-03 Dominique Fohr , Irina Illina

Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical…

We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker…

Sound · Computer Science 2026-04-10 David Joohun Kim , Daniyal Anjum , Bonny Banerjee , Omar Abbasi

In this paper, we address the task of determining whether a given utterance is directed towards a voice-enabled smart-assistant device or not. An undirected utterance is termed as a "false trigger" and false trigger mitigation (FTM) is…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-22 Pranay Dighe , Erik Marchi , Srikanth Vishnubhotla , Sachin Kajarekar , Devang Naik

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly…

Computation and Language · Computer Science 2022-07-13 Aparna Khare , Eunjung Han , Yuguang Yang , Andreas Stolcke

Conversational speech normally is embodied with loose syntactic structures at the utterance level but simultaneously exhibits topical coherence relations across consecutive utterances. Prior work has shown that capturing longer context…

Computation and Language · Computer Science 2022-06-02 Bi-Cheng Yan , Hsin-Wei Wang , Shih-Hsuan Chiu , Hsuan-Sheng Chiu , Berlin Chen

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g…

In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio. In these instances, the performance of tasks such as keyword-spotting (KWS) and device-directed speech detection (DDD) can…

Sound · Computer Science 2022-10-05 Samuele Cornell , Thomas Balestri , Thibaud Sénéchal

In this work, we address a novel, but potentially emerging, problem of discriminating the natural human voices and those played back by any kind of audio devices in the context of interactions with in-house voice user interface. The tackled…

Sound · Computer Science 2019-02-19 Thanh-Ha Le , Philippe Gilberton , Ngoc Q. K. Duong

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from…

Sound · Computer Science 2023-06-02 Shakeel A. Sheikh , Md Sahidullah , Fabrice Hirsch , Slim Ouni

This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Yusuke Kida , Dung Tran , Motoi Omachi , Toru Taniguchi , Yuya Fujita

Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is…

We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Vineet Garg , Ognjen Rudovic , Pranay Dighe , Ahmed H. Abdelaziz , Erik Marchi , Saurabh Adya , Chandra Dhir , Ahmed Tewfik

This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these…

We present a comprehensive study of deep bidirectional long short-term memory (LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR). We study the effect of size and depth and train models of up…

Neural and Evolutionary Computing · Computer Science 2019-08-06 Albert Zeyer , Patrick Doetsch , Paul Voigtlaender , Ralf Schlüter , Hermann Ney

Device-directed speech detection (DDSD) is a binary classification task that separates the user's queries to a voice assistant (VA) from background speech or side conversations. This is important for achieving naturalistic user experience.…

‹ Prev 1 2 3 10 Next ›