English
Related papers

Related papers: Label-Looping: Highly Efficient Decoding for Trans…

200 papers

The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle ~80% of the time. Leveraging a new CUDA 12.4 feature, CUDA graph…

Machine Learning · Computer Science 2024-06-07 Daniel Galvez , Vladimir Bataev , Hainan Xu , Tim Kaldewey

The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Wei Kang , Liyong Guo , Fangjun Kuang , Long Lin , Mingshuang Luo , Zengwei Yao , Xiaoyu Yang , Piotr Żelasko , Daniel Povey

This paper proposes an extremely lightweight phone-based transducer model with a tiny decoding graph on edge devices. First, a phone synchronous decoding (PSD) algorithm based on blank label skipping is first used to speed up the transducer…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Yuekai Zhang , Sining Sun , Long Ma

In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-18 Qian Zhang , Han Lu , Hasim Sak , Anshuman Tripathi , Erik McDermott , Stephen Koo , Shankar Kumar

Transformer-based NLP models are powerful but have high computational costs that limit deployment. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as…

Computation and Language · Computer Science 2024-11-19 Bo-Ru Lu , Nikita Haduong , Chien-Yu Lin , Hao Cheng , Noah A. Smith , Mari Ostendorf

Labeled sequence transduction is a task of transforming one sequence into another sequence that satisfies desiderata specified by a set of labels. In this paper we propose multi-space variational encoder-decoders, a new model for labeled…

Computation and Language · Computer Science 2019-10-08 Chunting Zhou , Graham Neubig

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and…

Computation and Language · Computer Science 2021-04-21 Wei Zhou , Simon Berger , Ralf Schlüter , Hermann Ney

The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper,…

Computation and Language · Computer Science 2022-08-01 Peng Shen , Xugang Lu , Hisashi Kawai

Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Lilit Grigoryan , Vladimir Bataev , Andrei Andrusenko , Hainan Xu , Vitaly Lavrukhin , Boris Ginsburg

End-to-end model, especially Recurrent Neural Network Transducer (RNN-T), has achieved great success in speech recognition. However, transducer requires a great memory footprint and computing time when processing a long decoding sequence.…

Sound · Computer Science 2023-07-18 Xiaohui Zhang , Mangui Liang , Zhengkun Tian , Jiangyan Yi , Jianhua Tao

Training deep neural networks with noisy labels remains a significant challenge, often leading to degraded performance. Existing methods for handling label noise typically rely on either transition matrix, noise detection, or meta-learning…

Machine Learning · Computer Science 2026-03-17 Zhanhui Lin , Yanlin Liu , Sanping Zhou

End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR). Since unique label histories correspond to…

Computation and Language · Computer Science 2020-12-15 Rohit Prabhavalkar , Yanzhang He , David Rybach , Sean Campbell , Arun Narayanan , Trevor Strohman , Tara N. Sainath

Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency.…

Machine Learning · Computer Science 2015-07-07 Paul Mineiro , Nikos Karampatziakis

Label-efficient time series representation learning, which aims to learn effective representations with limited labeled data, is crucial for deploying deep learning models in real-world applications. To address the scarcity of labeled time…

Machine Learning · Computer Science 2024-07-25 Emadeldeen Eldele , Mohamed Ragab , Zhenghua Chen , Min Wu , Chee-Keong Kwoh , Xiaoli Li

Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios…

Computation and Language · Computer Science 2024-01-04 Coleman Hooper , Sehoon Kim , Hiva Mohammadzadeh , Hasan Genc , Kurt Keutzer , Amir Gholami , Sophia Shao

This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by jointly predicting both a token and its duration, i.e. the number of…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-31 Hainan Xu , Fei Jia , Somshubra Majumdar , He Huang , Shinji Watanabe , Boris Ginsburg

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step. Those algorithms result in a large…

Machine Learning · Computer Science 2023-10-09 Gil Keren

Zero-shot cross-lingual transfer utilizing multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data. However, for NLP tasks that involve fine-grained predictions on words and phrases,…

Computation and Language · Computer Science 2024-02-06 Duong Minh Le , Yang Chen , Alan Ritter , Wei Xu

Reliable communication over noisy channels requires the design of specialized error-correcting codes (ECCs) tailored to specific system requirements. Recently, neural network-based decoders have emerged as promising tools for enhancing ECC…

Information Theory · Computer Science 2025-12-01 Anastasiia Kurmukova , Selim F. Yilmaz , Emre Ozfatura , Deniz Gunduz

Conventional turbo codes (CTCs) usually employ a block-oriented interleaving so that each block is separately encoded and decoded. As interleaving and de-interleaving are performed within a block, the message-passing process associated with…

Information Theory · Computer Science 2007-07-13 Yan-Xiu Zheng , Yu T. Su
‹ Prev 1 2 3 10 Next ›