English
Related papers

Related papers: Multi-task Learning for Voice Trigger Detection

200 papers

Automatic speech transcription and speaker recognition are usually treated as separate tasks even though they are interdependent. In this study, we investigate training a single network to perform both tasks jointly. We train the network in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-21 Siddharth Sigtia , Erik Marchi , Sachin Kajarekar , Devang Naik , John Bridle

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice…

We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-03 Siddharth Sigtia , John Bridle , Hywel Richards , Pascal Clark , Erik Marchi , Vineet Garg

Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential…

Sound · Computer Science 2019-05-14 Jianfeng Zhou , Tao Jiang , Lin Li , Qingyang Hong , Zhe Wang , Bingyin Xia

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic…

Sound · Computer Science 2022-11-02 Yu Li , Anisha Parsan , Bill Wang , Penghao Dong , Shanshan Yao , Ruwen Qin

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix…

Sound · Computer Science 2023-05-29 Nik Vaessen , David A. van Leeuwen

We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Saurabh Adya , Vineet Garg , Siddharth Sigtia , Pramod Simha , Chandra Dhir

In this paper, we present the XMUSPEECH system for Task 1 of 2020 Personalized Voice Trigger Challenge (PVTC2020). Task 1 is a joint wake-up word detection with speaker verification on close talking data. The whole system consists of a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-01 Dexin Liao , Jing Li , Yiming Zhi , Song Li , Qingyang Hong , Lin Li

This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and…

Sound · Computer Science 2024-04-19 Rong Wang , Kun Sun

Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users…

Computation and Language · Computer Science 2024-03-27 Dominik Wagner , Alexander Churchill , Siddharth Sigtia , Panayiotis Georgiou , Matt Mirsamadi , Aarshee Mishra , Erik Marchi

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of…

Computation and Language · Computer Science 2020-05-25 Yanpei Shi , Qiang Huang , Thomas Hain

In this work, we propose a multi-target backdoor attack against speaker identification using position-independent clicking sounds as triggers. Unlike previous single-target approaches, our method targets up to 50 speakers simultaneously,…

Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-15 Takuya Higuchi , Avamarie Brueggeman , Masood Delfarah , Stephen Shum

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training…

Computation and Language · Computer Science 2017-10-23 Yi Luan , Chris Brockett , Bill Dolan , Jianfeng Gao , Michel Galley

Keyword spotting is often implemented by keyword classifier to the encoder in acoustic models, enabling the classification of predefined or open vocabulary keywords. Although keyword spotting is a crucial task in various applications and…

Sound · Computer Science 2025-01-22 Myeonghoon Ryu , June-Woo Kim , Minseok Oh , Suji Lee , Han Park

In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low…

Sound · Computer Science 2022-01-21 J. Yang , Lei He

Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this…

Sound · Computer Science 2024-08-27 Lingwei Meng , Jiawen Kang , Yuejiao Wang , Zengrui Jin , Xixin Wu , Xunying Liu , Helen Meng

Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found…

Computation and Language · Computer Science 2016-09-28 Zhiyuan Tang , Lantian Li , Dong Wang

A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to…

Sound · Computer Science 2018-04-06 Daniel Stoller , Sebastian Ewert , Simon Dixon

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been…

Sound · Computer Science 2022-07-08 Junwen Xiong , Yu Zhou , Peng Zhang , Lei Xie , Wei Huang , Yufei Zha
‹ Prev 1 2 3 10 Next ›