Related papers: Unsupervised Spoken Utterance Classification

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken…

Computation and Language · Computer Science 2018-11-14 Aditya Siddhant , Anuj Goyal , Angeliki Metallinou

Unsupervised Speech Recognition

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe. This paper describes wav2vec-U,…

Computation and Language · Computer Science 2022-05-04 Alexei Baevski , Wei-Ning Hsu , Alexis Conneau , Michael Auli

Continuous Speech Recognition Based on Deterministic Finite Automata Machine using Utterance and Pitch Verification

This paper introduces a set of acoustic modeling techniques for utterance verification (UV) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word…

Formal Languages and Automata Theory · Computer Science 2014-01-22 M. Tharun Prasath

U-vectors: Generating clusterable speaker embedding from unlabeled data

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

Sound · Computer Science 2021-10-25 M. F. Mridha , Abu Quwsar Ohi , Muhammad Mostafa Monowar , Md. Abdul Hamid , Md. Rashedul Islam , Yutaka Watanobe

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex…

Multimedia · Computer Science 2024-05-22 Hanlei Zhang , Hua Xu , Fei Long , Xin Wang , Kai Gao

Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he or she wants the VA…

Human-Computer Interaction · Computer Science 2019-02-05 Atta Norouzian , Bogdan Mazoure , Dermot Connolly , Daniel Willett

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent…

Computation and Language · Computer Science 2024-02-02 Esaú Villatoro-Tello , Srikanth Madikeri , Juan Zuluaga-Gomez , Bidisha Sharma , Seyyed Saeed Sarfjoo , Iuliia Nigmatulina , Petr Motlicek , Alexei V. Ivanov , Aravind Ganapathiraju

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown…

Computation and Language · Computer Science 2022-11-08 Jiatong Shi , Chan-Jan Hsu , Holam Chung , Dongji Gao , Paola Garcia , Shinji Watanabe , Ann Lee , Hung-yi Lee

Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings

Inducing semantic representations directly from speech signals is a highly challenging task but has many useful applications in speech mining and spoken language understanding. This study tackles the unsupervised learning of semantic…

Computation and Language · Computer Science 2022-10-25 Jian Zhu , Zuoyu Tian , Yadong Liu , Cong Zhang , Chia-wen Lo

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach…

Computation and Language · Computer Science 2021-06-15 Sujeong Cha , Wangrui Hou , Hyun Jung , My Phung , Michael Picheny , Hong-Kwang Kuo , Samuel Thomas , Edmilson Morais

Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition

Unified Speech Recognition (USR) has emerged as a semi-supervised framework for training a single model for audio, visual, and audiovisual speech recognition, achieving state-of-the-art results on in-distribution benchmarks. However, its…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Alexandros Haliassos , Rodrigo Mira , Stavros Petridis

Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages

Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study…

Computation and Language · Computer Science 2022-04-19 Hemant Yadav , Akshat Gupta , Sai Krishna Rallabandi , Alan W Black , Rajiv Ratn Shah

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. Previous research has made progress in end-to-end SLU by using paired speech-text data, such as pre-trained Automatic Speech…

Computation and Language · Computer Science 2023-07-11 Guan-Wei Wu , Guan-Ting Lin , Shang-Wen Li , Hung-yi Lee

A Data Efficient End-To-End Spoken Language Understanding Architecture

End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures…

Computation and Language · Computer Science 2020-02-17 Marco Dinarelli , Nikita Kapoor , Bassam Jabaian , Laurent Besacier

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones. For example, Intent detection (ID), and slot filling (SF) require its…

Computation and Language · Computer Science 2021-04-14 Di Wu , Yiren Chen , Liang Ding , Dacheng Tao

On the Evaluation of Dialogue Systems with Next Utterance Classification

An open challenge in constructing dialogue systems is developing methods for automatically learning dialogue strategies from large amounts of unlabelled data. Recent work has proposed Next-Utterance-Classification (NUC) as a surrogate task…

Computation and Language · Computer Science 2016-07-26 Ryan Lowe , Iulian V. Serban , Mike Noseworthy , Laurent Charlin , Joelle Pineau

On Building Spoken Language Understanding Systems for Low Resourced Languages

Spoken dialog systems are slowly becoming and integral part of the human experience due to their various advantages over textual interfaces. Spoken language understanding (SLU) systems are fundamental building blocks of spoken dialog…

Computation and Language · Computer Science 2022-05-26 Akshat Gupta

Towards End-to-end Unsupervised Speech Recognition

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of…

Computation and Language · Computer Science 2022-06-16 Alexander H. Liu , Wei-Ning Hsu , Michael Auli , Alexei Baevski

Robust Spoken Language Understanding via Paraphrasing

Learning intents and slot labels from user utterances is a fundamental step in all spoken language understanding (SLU) and dialog systems. State-of-the-art neural network based methods, after deployment, often suffer from performance…

Computation and Language · Computer Science 2018-09-19 Avik Ray , Yilin Shen , Hongxia Jin

Speaker-Utterance Dual Attention for Speaker and Utterance Verification

In this paper, we study a novel technique that exploits the interaction between speaker traits and linguistic content to improve both speaker verification and utterance verification performance. We implement an idea of speaker-utterance…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-21 Tianchi Liu , Rohan Kumar Das , Maulik Madhavi , Shengmei Shen , Haizhou Li