Label-Looping: Highly Efficient Decoding for Transducers

Vladimir Bataev; Hainan Xu; Daniel Galvez; Vitaly Lavrukhin; Boris Ginsburg

doi:10.1109/SLT61566.2024.10832333

Label-Looping: Highly Efficient Decoding for Transducers

Audio and Speech Processing 2025-01-20 v2 Artificial Intelligence Computation and Language Machine Learning Sound

Authors: Vladimir Bataev , Hainan Xu , Daniel Galvez , Vitaly Lavrukhin , Boris Ginsburg

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special structure using CUDA tensors, supporting parallelized hypotheses manipulations. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32. It can be further combined with other compiler or GPU call-related techniques to achieve even more speedup. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. We open-source our implementation to benefit the research community.

Keywords

speech and audio processing hardware acceleration neural networks for signal processing

Cite

@article{arxiv.2406.06220,
  title  = {Label-Looping: Highly Efficient Decoding for Transducers},
  author = {Vladimir Bataev and Hainan Xu and Daniel Galvez and Vitaly Lavrukhin and Boris Ginsburg},
  journal= {arXiv preprint arXiv:2406.06220},
  year   = {2025}
}

Comments

Accepted at IEEE SLT 2024

Label-Looping: Highly Efficient Decoding for Transducers

Abstract

Keywords

Cite

Comments

Related papers