Related papers: A Hardware-Oriented and Memory-Efficient Method fo…

Blank Collapse: Compressing CTC emission for the faster decoding

Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an…

Computation and Language · Computer Science 2023-06-28 Minkyu Jung , Ohhyeok Kwon , Seunghyun Seo , Soonshin Seo

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms

CTC-based ASR systems face computational and memory bottlenecks in resource-limited environments. Traditional CTC decoders, requiring up to 90% of processing time in systems (e.g., wav2vec2-large on L4 GPUs), face inefficiencies due to…

Machine Learning · Computer Science 2025-10-13 Atul Shree , Harshith Jupuru

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems…

Computer Vision and Pattern Recognition · Computer Science 2024-02-21 David Gimeno-Gómez , Carlos-D. Martínez-Hinarejos

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional…

Computation and Language · Computer Science 2018-02-16 Kalpesh Krishna , Liang Lu , Kevin Gimpel , Karen Livescu

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-14 Lilit Grigoryan , Vladimir Bataev , Nikolay Karpov , Andrei Andrusenko , Vitaly Lavrukhin , Boris Ginsburg

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels

The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text…

Sound · Computer Science 2024-02-06 Jiaming Zhou , Shiwan Zhao , Yaqi Liu , Wenjia Zeng , Yong Chen , Yong Qin

CR-CTC: Consistency regularization on CTC for improved speech recognition

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-17 Zengwei Yao , Wei Kang , Xiaoyu Yang , Fangjun Kuang , Liyong Guo , Han Zhu , Zengrui Jin , Zhaoqing Li , Long Lin , Daniel Povey

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-10 Daniel Galvez , Tim Kaldewey

Residual Convolutional CTC Networks for Automatic Speech Recognition

Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However,…

Computation and Language · Computer Science 2017-02-28 Yisen Wang , Xuejiao Deng , Songbai Pu , Zhiheng Huang

Comparison of Decoding Strategies for CTC Acoustic Models

Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature…

Computation and Language · Computer Science 2017-08-16 Thomas Zenkel , Ramon Sanabria , Florian Metze , Jan Niehues , Matthias Sperber , Sebastian Stüker , Alex Waibel

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower…

Computer Vision and Pattern Recognition · Computer Science 2020-02-05 Wenyang Hu , Xiaocong Cai , Jun Hou , Shuai Yi , Zhiping Lin

Distilling the Knowledge of BERT for CTC-based ASR

Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the…

Computation and Language · Computer Science 2022-09-07 Hayato Futami , Hirofumi Inaguma , Masato Mimura , Shinsuke Sakai , Tatsuya Kawahara

Beam Search Decoding using Manner of Articulation Detection Knowledge Derived from Connectionist Temporal Classification

Manner of articulation detection using deep neural networks require a priori knowledge of the attribute discriminative features or the decent phoneme alignments. However generating an appropriate phoneme alignment is complex and its…

Computation and Language · Computer Science 2018-11-20 Pradeep Rangan , Sreenivasa Rao K

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the…

Computation and Language · Computer Science 2022-10-12 Brian Yan , Siddharth Dalmia , Yosuke Higuchi , Graham Neubig , Florian Metze , Alan W Black , Shinji Watanabe

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-17 Yosuke Higuchi , Hirofumi Inaguma , Shinji Watanabe , Tetsuji Ogawa , Tetsunori Kobayashi

An improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting of the connectionist temporal classification(CTC) and the attention encoder-decoder achieved state-of-the-art results. In this paper, we propose a novel CTC decoder…

Sound · Computer Science 2018-11-02 Zhe Yuan , Zhuoran Lyu , Jiwei Li , Xi Zhou

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is…

Computation and Language · Computer Science 2017-06-12 Takaaki Hori , Shinji Watanabe , Yu Zhang , William Chan

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior accuracy. However, they often suffer from slow inference. This is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-13 Masao Someki , Nicholas Eng , Yosuke Higuchi , Shinji Watanabe

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders

End-to-end automatic speech recognition (E2E-ASR) can be classified by its decoder architectures, such as connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention-based encoder-decoder, and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-15 Yui Sudo , Muhammad Shakeel , Yosuke Fukumoto , Brian Yan , Jiatong Shi , Yifan Peng , Shinji Watanabe

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-24 Burin Naowarat , Thananchai Kongthaworn , Korrawe Karunratanakul , Sheng Hui Wu , Ekapol Chuangsuwanich