English
Related papers

Related papers: A Hardware-Oriented and Memory-Efficient Method fo…

200 papers

Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an…

Computation and Language · Computer Science 2023-06-28 Minkyu Jung , Ohhyeok Kwon , Seunghyun Seo , Soonshin Seo

CTC-based ASR systems face computational and memory bottlenecks in resource-limited environments. Traditional CTC decoders, requiring up to 90% of processing time in systems (e.g., wav2vec2-large on L4 GPUs), face inefficiencies due to…

Machine Learning · Computer Science 2025-10-13 Atul Shree , Harshith Jupuru

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems…

Computer Vision and Pattern Recognition · Computer Science 2024-02-21 David Gimeno-Gómez , Carlos-D. Martínez-Hinarejos

Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional…

Computation and Language · Computer Science 2018-02-16 Kalpesh Krishna , Liang Lu , Kevin Gimpel , Karen Livescu

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-14 Lilit Grigoryan , Vladimir Bataev , Nikolay Karpov , Andrei Andrusenko , Vitaly Lavrukhin , Boris Ginsburg

The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text…

Sound · Computer Science 2024-02-06 Jiaming Zhou , Shiwan Zhao , Yaqi Liu , Wenjia Zeng , Yong Chen , Yong Qin

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-17 Zengwei Yao , Wei Kang , Xiaoyu Yang , Fangjun Kuang , Liyong Guo , Han Zhu , Zengrui Jin , Zhaoqing Li , Long Lin , Daniel Povey

While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-10 Daniel Galvez , Tim Kaldewey

Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However,…

Computation and Language · Computer Science 2017-02-28 Yisen Wang , Xuejiao Deng , Songbai Pu , Zhiheng Huang

Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature…

Computation and Language · Computer Science 2017-08-16 Thomas Zenkel , Ramon Sanabria , Florian Metze , Jan Niehues , Matthias Sperber , Sebastian Stüker , Alex Waibel

Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower…

Computer Vision and Pattern Recognition · Computer Science 2020-02-05 Wenyang Hu , Xiaocong Cai , Jun Hou , Shuai Yi , Zhiping Lin

Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the…

Computation and Language · Computer Science 2022-09-07 Hayato Futami , Hirofumi Inaguma , Masato Mimura , Shinsuke Sakai , Tatsuya Kawahara

Manner of articulation detection using deep neural networks require a priori knowledge of the attribute discriminative features or the decent phoneme alignments. However generating an appropriate phoneme alignment is complex and its…

Computation and Language · Computer Science 2018-11-20 Pradeep Rangan , Sreenivasa Rao K

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the…

Computation and Language · Computer Science 2022-10-12 Brian Yan , Siddharth Dalmia , Yosuke Higuchi , Graham Neubig , Florian Metze , Alan W Black , Shinji Watanabe

For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-17 Yosuke Higuchi , Hirofumi Inaguma , Shinji Watanabe , Tetsuji Ogawa , Tetsunori Kobayashi

Recently, end-to-end speech recognition with a hybrid model consisting of the connectionist temporal classification(CTC) and the attention encoder-decoder achieved state-of-the-art results. In this paper, we propose a novel CTC decoder…

Sound · Computer Science 2018-11-02 Zhe Yuan , Zhuoran Lyu , Jiwei Li , Xi Zhou

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is…

Computation and Language · Computer Science 2017-06-12 Takaaki Hori , Shinji Watanabe , Yu Zhang , William Chan

Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior accuracy. However, they often suffer from slow inference. This is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-13 Masao Someki , Nicholas Eng , Yosuke Higuchi , Shinji Watanabe

End-to-end automatic speech recognition (E2E-ASR) can be classified by its decoder architectures, such as connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention-based encoder-decoder, and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-15 Yui Sudo , Muhammad Shakeel , Yosuke Fukumoto , Brian Yan , Jiatong Shi , Yifan Peng , Shinji Watanabe

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-24 Burin Naowarat , Thananchai Kongthaworn , Korrawe Karunratanakul , Sheng Hui Wu , Ekapol Chuangsuwanich
‹ Prev 1 2 3 10 Next ›