Related papers: Spoken Language Intent Detection using Confusion2V…

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes…

Computation and Language · Computer Science 2022-05-04 Prashanth Gurunath Shivakumar , Panayiotis Georgiou , Shrikanth Narayanan

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent…

Computation and Language · Computer Science 2024-02-02 Esaú Villatoro-Tello , Srikanth Madikeri , Juan Zuluaga-Gomez , Bidisha Sharma , Seyyed Saeed Sarfjoo , Iuliia Nigmatulina , Petr Motlicek , Alexei V. Ivanov , Aravind Ganapathiraju

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In…

Computation and Language · Computer Science 2024-01-08 Kevin Everson , Yile Gu , Huck Yang , Prashanth Gurunath Shivakumar , Guan-Ting Lin , Jari Kolehmainen , Ivan Bulyko , Ankur Gandhe , Shalini Ghosh , Wael Hamza , Hung-yi Lee , Ariya Rastrow , Andreas Stolcke

Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding

Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech…

Computation and Language · Computer Science 2020-11-03 Chao-Wei Huang , Yun-Nung Chen

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

Word vector representations are a crucial part of Natural Language Processing (NLP) and Human Computer Interaction. In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Panayiotis Georgiou

ASR error management for improving spoken language understanding

This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR…

Computation and Language · Computer Science 2017-05-29 Edwin Simonnet , Sahar Ghannay , Nathalie Camelin , Yannick Estève , Renato De Mori

Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding

Recently, deep end-to-end learning has been studied for intent classification in Spoken Language Understanding (SLU). However, end-to-end models require a large amount of speech data with intent labels, and highly optimized models are…

Computation and Language · Computer Science 2024-05-27 Suyoung Kim , Jiyeon Hwang , Ho-Young Jung

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Swayambhu Nath Ray , Minhua Wu , Anirudh Raju , Pegah Ghahremani , Raghavendra Bilgi , Milind Rao , Harish Arsikere , Ariya Rastrow , Andreas Stolcke , Jasha Droppo

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-01 Lasse Borgholt , Jakob Drachmann Havtorn , Mostafa Abdou , Joakim Edin , Lars Maaløe , Anders Søgaard , Christian Igel

End-to-end architectures for ASR-free spoken language understanding

Spoken Language Understanding (SLU) is the problem of extracting the meaning from speech utterances. It is typically addressed as a two-step problem, where an Automatic Speech Recognition (ASR) model is employed to convert speech into text,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-04 Elisavet Palogiannidi , Ioannis Gkinis , George Mastrapas , Petr Mizera , Themos Stafylakis

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel…

Computation and Language · Computer Science 2022-10-24 Pranay Dighe , Prateeth Nayak , Oggi Rudovic , Erik Marchi , Xiaochuan Niu , Ahmed Tewfik

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the…

Sound · Computer Science 2021-04-21 Shahin Amiriparian , Artem Sokolov , Ilhan Aslan , Lukas Christ , Maurice Gerczuk , Tobias Hübner , Dmitry Lamanov , Manuel Milling , Sandra Ottl , Ilya Poduremennykh , Evgeniy Shuranov , Björn W. Schuller

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach…

Computation and Language · Computer Science 2021-06-15 Sujeong Cha , Wangrui Hou , Hyun Jung , My Phung , Michael Picheny , Hong-Kwang Kuo , Samuel Thomas , Edmilson Morais

Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture

This research presents a novel approach to enhancing automatic speech recognition systems by integrating noise detection capabilities directly into the recognition architecture. Building upon the wav2vec2 framework, the proposed method…

Sound · Computer Science 2025-12-11 Karamvir Singh

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Lokesh Bansal , S. Pavankumar Dubagunta , Malolan Chetlur , Pushpak Jagtap , Aravind Ganapathiraju

RNN based Incremental Online Spoken Language Understanding

Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU…

Computation and Language · Computer Science 2020-12-01 Prashanth Gurunath Shivakumar , Naveen Kumar , Panayiotis Georgiou , Shrikanth Narayanan

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to learn good speech representations from a large amount of unlabeled speech for the downstream ASR task. However, most SSL frameworks do not consider…

Computation and Language · Computer Science 2022-01-27 Yiming Wang , Jinyu Li , Heming Wang , Yao Qian , Chengyi Wang , Yu Wu

Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling

Automatic speech recognition (ASR) has been an essential component of computer assisted language learning (CALL) and computer assisted language testing (CALT) for many years. As this technology continues to develop rapidly, it is important…

Computation and Language · Computer Science 2025-04-01 Michael McGuire

A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challenging task. Current Automatic Speech Recognition (ASR) models require substantial amounts of annotated data for training, which is scarce.…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-14 Rishabh Jain , Andrei Barcovschi , Mariam Yiwere , Dan Bigioi , Peter Corcoran , Horia Cucu

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-10 Qiu-Shi Zhu , Jie Zhang , Zi-Qiang Zhang , Ming-Hui Wu , Xin Fang , Li-Rong Dai