Related papers: Learning Decoupling Features Through Orthogonality…
Keyword Spotting (KWS) is an essential component in a smart device for alerting the system when a user prompts it with a command. As these devices are typically constrained by computational and energy resources, the KWS model should be…
Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV…
Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not--…
This paper addresses the persistent challenge in Keyword Spotting (KWS), a fundamental component in speech technology, regarding the acquisition of substantial labeled data for training. Given the difficulty in obtaining large quantities of…
Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-on and battery-powered application scenarios for smart devices put constraints on hardware resources and power consumption, while also demanding high…
The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we…
As advancements in technologies like Internet of Things (IoT), Automatic Speech Recognition (ASR), Speaker Verification (SV), and Text-to-Speech (TTS) lead to increased usage of intelligent voice assistants, the demand for privacy and…
Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data…
Speech recognition is a sequence prediction problem. Besides employing various deep learning approaches for framelevel classification, sequence-level discriminative training has been proved to be indispensable to achieve the…
Robustness against noise is critical for keyword spotting (KWS) in real-world environments. To improve the robustness, a speech enhancement front-end is involved. Instead of treating the speech enhancement as a separated preprocessing…
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding…
Open-vocabulary keyword spotting (KWS) refers to the task of detecting words or terms within speech recordings, regardless of whether they were included in the training data. This paper introduces an open-vocabulary keyword spotting model…
In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic…
In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention…
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training…
User-defined keyword spotting (KWS) is crucial for personalized voice interaction, yet existing methods face several challenges: (1) insufficient discriminability among confusable words, (2) performance inconsistency across speakers with…
Keyword spotting (KWS) aims to discriminate a specific wake-up word from other signals precisely and efficiently for different users. Recent works utilize various deep networks to train KWS models with all users' speech data centralized…
Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still…
Keyword Spotting (KWS) systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges due to domain shifts caused by varying noise and recording conditions. To address this, we propose a…
Keyword spotting (KWS) is one of the speech recognition tasks most sensitive to the quality of the feature representation. However, the research on KWS has traditionally focused on new model topologies, putting little emphasis on other…