English
Related papers

Related papers: Knowledge Transfer for Efficient On-device False T…

200 papers

Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-30 Pranay Dighe , Saurabh Adya , Nuoyu Li , Srikanth Vishnubhotla , Devang Naik , Adithya Sagar , Ying Ma , Stephen Pulman , Jason Williams

We present a unified and hardware efficient architecture for two stage voice trigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants can get falsely activated to audio segments…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-17 Vineet Garg , Wonil Chang , Siddharth Sigtia , Saurabh Adya , Pramod Simha , Pranay Dighe , Chandra Dhir

We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Vineet Garg , Ognjen Rudovic , Pranay Dighe , Ahmed H. Abdelaziz , Erik Marchi , Saurabh Adya , Chandra Dhir , Ahmed Tewfik

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be…

Sound · Computer Science 2021-10-12 Ognjen Rudovic , Akanksha Bindal , Vineet Garg , Pramod Simha , Pranay Dighe , Sachin Kajarekar

False triggers in voice assistants are unintended invocations of the assistant, which not only degrade the user experience but may also compromise privacy. False trigger mitigation (FTM) is a process to detect the false trigger events and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-20 Rishika Agarwal , Xiaochuan Niu , Pranay Dighe , Srikanth Vishnubhotla , Sameer Badaskar , Devang Naik

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as…

Computation and Language · Computer Science 2018-08-09 Sri Harish Mallidi , Roland Maas , Kyle Goehner , Ariya Rastrow , Spyros Matsoukas , Björn Hoffmeister

In this paper, we present a method for correcting automatic speech recognition (ASR) errors using a finite state transducer (FST) intent recognition framework. Intent recognition is a powerful technique for dialog flow management in…

Computation and Language · Computer Science 2019-08-22 Piotr Żelasko , Jan Mizgajski , Mikołaj Morzy , Adrian Szymczak , Piotr Szymański , Łukasz Augustyniak , Yishay Carmiel

Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users…

Computation and Language · Computer Science 2024-03-27 Dominik Wagner , Alexander Churchill , Siddharth Sigtia , Panayiotis Georgiou , Matt Mirsamadi , Aarshee Mishra , Erik Marchi

Recent advances in AudioLLMs have enabled spoken dialogue systems to move beyond turn-based interaction toward real-time full-duplex communication, where the agent must decide when to speak, yield, or interrupt while the user is still…

We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Saurabh Adya , Vineet Garg , Siddharth Sigtia , Pramod Simha , Chandra Dhir

Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel…

Computation and Language · Computer Science 2022-10-24 Pranay Dighe , Prateeth Nayak , Oggi Rudovic , Erik Marchi , Xiaochuan Niu , Ahmed Tewfik

Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These…

Computation and Language · Computer Science 2024-11-05 Robin Shing-Hei Yuen , Timothy Tin-Long Tse , Jian Zhu

Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator,…

Computation and Language · Computer Science 2022-11-09 Yik-Cheung Tam , Jiacheng Xu , Jiakai Zou , Zecheng Wang , Tinglong Liao , Shuhan Yuan

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice…

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted…

Computation and Language · Computer Science 2022-11-08 Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

The perceptual quality of neural text-to-speech (TTS) is highly dependent on the choice of the model during training. Selecting the model using a training-objective metric such as the least mean squared error does not always correlate with…

We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-03 Siddharth Sigtia , John Bridle , Hywel Richards , Pascal Clark , Erik Marchi , Vineet Garg

Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Swayambhu Nath Ray , Minhua Wu , Anirudh Raju , Pegah Ghahremani , Raghavendra Bilgi , Milind Rao , Harish Arsikere , Ariya Rastrow , Andreas Stolcke , Jasha Droppo

Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics. This paper proposes a novel technique to overcome negative transfer by…

Machine Learning · Computer Science 2015-09-18 Mortaza Doulaty , Oscar Saz , Thomas Hain

Agent assistance during human-human customer support spoken interactions requires triggering workflows based on the caller's intent (reason for call). Timeliness of prediction is essential for a good user experience. The goal is for a…

Artificial Intelligence · Computer Science 2022-08-16 Mrinal Rawat , Victor Barres
‹ Prev 1 2 3 10 Next ›