Related papers: Intermediate Loss Regularization for CTC-based Spe…

Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization

End-to-end (E2E) automatic speech recognition (ASR) systems have revolutionized the field by integrating all components into a single neural network, with attention-based encoder-decoder models achieving state-of-the-art performance.…

Computation and Language · Computer Science 2025-07-01 Duygu Altinok

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models. We train a CTC-based ASR model with auxiliary CTC losses in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-11 Jumon Nozaki , Tatsuya Komatsu

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

This paper proposes an adaptation method for end-to-end speech recognition. In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC)…

Computation and Language · Computer Science 2021-04-01 Cong-Thanh Do , Rama Doddipatla , Thomas Hain

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-24 Burin Naowarat , Thananchai Kongthaworn , Korrawe Karunratanakul , Sheng Hui Wu , Ekapol Chuangsuwanich

CR-CTC: Consistency regularization on CTC for improved speech recognition

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-17 Zengwei Yao , Wei Kang , Xiaoyu Yang , Fangjun Kuang , Liyong Guo , Han Zhu , Zengrui Jin , Zhaoqing Li , Long Lin , Daniel Povey

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-17 Yosuke Higuchi , Hirofumi Inaguma , Shinji Watanabe , Tetsuji Ogawa , Tetsunori Kobayashi

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an…

Computation and Language · Computer Science 2022-10-27 Xulong Zhang , Jianzong Wang , Ning Cheng , Mengyuan Zhao , Zhiyong Zhang , Jing Xiao

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

In end-to-end automatic speech recognition (ASR), a model is expected to implicitly learn representations suitable for recognizing a word-level sequence. However, the huge abstraction gap between input acoustic signals and output linguistic…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-09 Yosuke Higuchi , Keita Karube , Tetsuji Ogawa , Tetsunori Kobayashi

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers…

Computation and Language · Computer Science 2019-06-27 Naoyuki Kanda , Shota Horiguchi , Ryoichi Takashima , Yusuke Fujita , Kenji Nagamatsu , Shinji Watanabe

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence…

Computation and Language · Computer Science 2022-03-08 Keqi Deng , Songjun Cao , Yike Zhang , Long Ma , Gaofeng Cheng , Ji Xu , Pengyuan Zhang

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

In this work, we describe a novel method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by…

Computation and Language · Computer Science 2023-06-21 Woojay Jeon

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the…

Computation and Language · Computer Science 2023-06-05 Dongji Gao , Matthew Wiesner , Hainan Xu , Leibny Paola Garcia , Daniel Povey , Sanjeev Khudanpur

Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to…

Computation and Language · Computer Science 2023-12-18 Tzu-Ting Yang , Hsin-Wei Wang , Berlin Chen

Cross-modal Alignment with Optimal Transport for CTC-based ASR

Temporal connectionist temporal classification (CTC)-based automatic speech recognition (ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the token independence assumption in decoding, an external language…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-26 Xugang Lu , Peng Shen , Yu Tsao , Hisashi Kawai

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the…

Computation and Language · Computer Science 2022-10-12 Brian Yan , Siddharth Dalmia , Yosuke Higuchi , Graham Neubig , Florian Metze , Alan W Black , Shinji Watanabe

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Ziyin Zhang , Ning Lu , Minghui Liao , Yongshuai Huang , Cheng Li , Min Wang , Wei Peng

A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition

Siamese networks have shown effective results in unsupervised visual representation learning. These models are designed to learn an invariant representation of two augmentations for one input by maximizing their similarity. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-23 Yingying Gao , Junlan Feng , Tianrui Wang , Chao Deng , Shilei Zhang

End-to-end Speech Recognition with similar length speech and text

The mismatch of speech length and text length poses a challenge in automatic speech recognition (ASR). In previous research, various approaches have been employed to align text with speech, including the utilization of Connectionist…

Computation and Language · Computer Science 2025-10-14 Peng Fan , Wenping Wang , Fei Deng

Residual Convolutional CTC Networks for Automatic Speech Recognition

Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However,…

Computation and Language · Computer Science 2017-02-28 Yisen Wang , Xuejiao Deng , Songbai Pu , Zhiheng Huang

Joint Masked CPC and CTC Training for ASR

Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR). But, training SSL models like wav2vec~2.0 requires a two-stage pipeline. In this paper we…

Computation and Language · Computer Science 2021-02-16 Chaitanya Talnikar , Tatiana Likhomanenko , Ronan Collobert , Gabriel Synnaeve