Related papers: Fast and parallel decoding for transducer

Streaming parallel transducer beam search with fast-slow cascaded encoders

Streaming ASR with strict latency constraints is required in many speech recognition applications. In order to achieve the required latency, streaming ASR models sacrifice accuracy compared to non-streaming ASR models due to lack of future…

Computation and Language · Computer Science 2022-03-30 Jay Mahadeokar , Yangyang Shi , Ke Li , Duc Le , Jiedan Zhu , Vikas Chandra , Ozlem Kalinli , Michael L Seltzer

Pushing the Limits of Beam Search Decoding for Transducer-based ASR models

Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Lilit Grigoryan , Vladimir Bataev , Andrei Andrusenko , Hainan Xu , Vitaly Lavrukhin , Boris Ginsburg

Label-Looping: Highly Efficient Decoding for Transducers

This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-20 Vladimir Bataev , Hainan Xu , Daniel Galvez , Vitaly Lavrukhin , Boris Ginsburg

Parallel Composition of Weighted Finite-State Transducers

Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the…

Computation and Language · Computer Science 2021-10-07 Shubho Sengupta , Vineel Pratap , Awni Hannun

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost.…

Computation and Language · Computer Science 2018-08-29 Yun Chen , Victor O. K. Li , Kyunghyun Cho , Samuel R. Bowman

Delay-penalized transducer for low-latency streaming ASR

In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Wei Kang , Zengwei Yao , Fangjun Kuang , Liyong Guo , Xiaoyu Yang , Long lin , Piotr Żelasko , Daniel Povey

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning. Conversely, label-based attention…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-25 Emiru Tsunoo , Hayato Futami , Yosuke Kashiwagi , Siddhant Arora , Shinji Watanabe

Accelerating Transformer Inference for Translation via Parallel Decoding

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the…

Computation and Language · Computer Science 2025-02-06 Andrea Santilli , Silvio Severino , Emilian Postolache , Valentino Maiorca , Michele Mancusi , Riccardo Marin , Emanuele Rodolà

Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation

We adapt the well-known beam-search algorithm for machine translation to operate in a cascaded real-time speech translation system. This proved to be more complex than initially anticipated, due to four key challenges: (1) real-time…

Computation and Language · Computer Science 2024-07-17 Rastislav Rabatin , Frank Seide , Ernie Chang

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Transformers have recently dominated the ASR field. Although able to yield good performance, they involve an autoregressive (AR) decoder to generate tokens one by one, which is computationally inefficient. To speed up inference,…

Sound · Computer Science 2023-03-31 Zhifu Gao , Shiliang Zhang , Ian McLoughlin , Zhijie Yan

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

A Token-Wise Beam Search Algorithm for RNN-T

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step. Those algorithms result in a large…

Machine Learning · Computer Science 2023-10-09 Gil Keren

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-10 Daniel Galvez , Tim Kaldewey

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. We aim to improve the recognition accuracy of the fast-slow encoder based transducer while keeping its…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-16 Ke Li , Jay Mahadeokar , Jinxi Guo , Yangyang Shi , Gil Keren , Ozlem Kalinli , Michael L. Seltzer , Duc Le

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly…

Computation and Language · Computer Science 2024-04-19 Pengfei Wu , Jiahao Liu , Zhuocheng Gong , Qifan Wang , Jinpeng Li , Jingang Wang , Xunliang Cai , Dongyan Zhao

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle ~80% of the time. Leveraging a new CUDA 12.4 feature, CUDA graph…

Machine Learning · Computer Science 2024-06-07 Daniel Galvez , Vladimir Bataev , Hainan Xu , Tim Kaldewey

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition

Attention-based encoder decoder network uses a left-to-right beam search algorithm in the inference step. The current beam search expands hypotheses and traverses the expanded hypotheses at the next time step. This traversal is implemented…

Sound · Computer Science 2018-11-13 Hiroshi Seki , Takaaki Hori , Shinji Watanabe

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared…

Computation and Language · Computer Science 2024-04-09 Hainan Xu , Zhehuai Chen , Fei Jia , Boris Ginsburg

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

In this paper we present a Transformer-Transducer model architecture and a training technique to unify streaming and non-streaming speech recognition models into one model. The model is composed of a stack of transformer layers for audio…

Sound · Computer Science 2020-10-08 Anshuman Tripathi , Jaeyoung Kim , Qian Zhang , Han Lu , Hasim Sak

SPEED: Speculative Pipelined Execution for Efficient Decoding

Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios…

Computation and Language · Computer Science 2024-01-04 Coleman Hooper , Sehoon Kim , Hiva Mohammadzadeh , Hasan Genc , Kurt Keutzer , Amir Gholami , Sophia Shao