English
Related papers

Related papers: Enhancing Speech Recognition Decoding via Layer Ag…

200 papers

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search,…

Computation and Language · Computer Science 2022-12-29 Tomer Wullach , Shlomo E. Chazan

Self-supervised learning (SSL) models like Wav2Vec2, HuBERT, and WavLM have been widely used in speech processing. These transformer-based models consist of multiple layers, each capturing different levels of representation. While prior…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Xinyu Liang , Fredrik Cumlin , Victor Ungureanu , Chandan K. A. Reddy , Christian Schuldt , Saikat Chatterjee

As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in simple beam search to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-24 Wei Zhou , Ralf Schlüter , Hermann Ney

Self-supervised learning has shown great success in Speech Recognition. However, it has been observed that finetuning all layers of the learned model leads to lower performance compared to resetting top layers. This phenomenon is attributed…

Computation and Language · Computer Science 2024-05-15 Valentin Vielzeuf

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-24 Mahdi Hajibabaei , Dengxin Dai

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level…

Computation and Language · Computer Science 2021-09-13 Vladimir Araujo , Andrés Villa , Marcelo Mendoza , Marie-Francine Moens , Alvaro Soto

Large language models (LLMs) have become proficient at solving a wide variety of tasks, including those involving multi-modal inputs. In particular, instantiating an LLM (such as LLaMA) with a speech encoder and training it on paired data…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-13 Desh Raj , Gil Keren , Junteng Jia , Jay Mahadeokar , Ozlem Kalinli

Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Tae Jin Park , Kunal Dhawan , Nithin Koluguri , Jagadeesh Balam

End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR) and perform competitively relative to conventional models. To further improve the quality, a two-pass model has been proposed to rescore streamed…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-19 Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prabhavalkar

Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually…

Information Retrieval · Computer Science 2025-09-30 Zhongbin Xie , Thomas Lukasiewicz

Pre-trained acoustic representations such as wav2vec and DeCoAR have attained impressive word error rates (WER) for speech recognition benchmarks, particularly when labeled data is limited. But little is known about what phonetic properties…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Danni Ma , Neville Ryant , Mark Liberman

Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior accuracy. However, they often suffer from slow inference. This is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-13 Masao Someki , Nicholas Eng , Yosuke Higuchi , Shinji Watanabe

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Speaker verification is an established yet challenging task in speech processing and a very vibrant research area. Recent speaker verification (SV) systems rely on deep neural networks to extract high-level embeddings which are able to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-23 Fei Tao , Gokhan Tur

In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized…

Sound · Computer Science 2025-01-22 Or Haim Anidjar , Roi Yozevitch

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach…

Computation and Language · Computer Science 2022-02-28 L. Mangu , E. Brill , A. Stolcke

This paper proposes a novel approach that uses deep neural networks for classifying imagined speech, significantly increasing the classification accuracy. The proposed approach employs only the EEG channels over specific areas of the brain…

Neurons and Cognition · Quantitative Biology 2020-03-24 Jerrin Thomas Panachakel , A. G. Ramakrishnan , A. G. Ramakrishnan

Multilingual end-to-end automatic speech recognition models are attractive due to its simplicity in training and deployment. Recent work on large-scale training of such models has shown promising results compared to monolingual models.…

Computation and Language · Computer Science 2022-10-13 Ke Hu , Bo Li , Tara N. Sainath
‹ Prev 1 2 3 10 Next ›