English
Related papers

Related papers: Intermediate Loss Regularization for CTC-based Spe…

200 papers

End-to-end (E2E) automatic speech recognition (ASR) systems have revolutionized the field by integrating all components into a single neural network, with attention-based encoder-decoder models achieving state-of-the-art performance.…

Computation and Language · Computer Science 2025-07-01 Duygu Altinok

This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models. We train a CTC-based ASR model with auxiliary CTC losses in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-11 Jumon Nozaki , Tatsuya Komatsu

This paper proposes an adaptation method for end-to-end speech recognition. In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC)…

Computation and Language · Computer Science 2021-04-01 Cong-Thanh Do , Rama Doddipatla , Thomas Hain

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-24 Burin Naowarat , Thananchai Kongthaworn , Korrawe Karunratanakul , Sheng Hui Wu , Ekapol Chuangsuwanich

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-17 Zengwei Yao , Wei Kang , Xiaoyu Yang , Fangjun Kuang , Liyong Guo , Han Zhu , Zengrui Jin , Zhaoqing Li , Long Lin , Daniel Povey

For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-17 Yosuke Higuchi , Hirofumi Inaguma , Shinji Watanabe , Tetsuji Ogawa , Tetsunori Kobayashi

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an…

Computation and Language · Computer Science 2022-10-27 Xulong Zhang , Jianzong Wang , Ning Cheng , Mengyuan Zhao , Zhiyong Zhang , Jing Xiao

In end-to-end automatic speech recognition (ASR), a model is expected to implicitly learn representations suitable for recognizing a word-level sequence. However, the huge abstraction gap between input acoustic signals and output linguistic…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-09 Yosuke Higuchi , Keita Karube , Tetsuji Ogawa , Tetsunori Kobayashi

In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers…

Computation and Language · Computer Science 2019-06-27 Naoyuki Kanda , Shota Horiguchi , Ryoichi Takashima , Yusuke Fujita , Kenji Nagamatsu , Shinji Watanabe

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence…

Computation and Language · Computer Science 2022-03-08 Keqi Deng , Songjun Cao , Yike Zhang , Long Ma , Gaofeng Cheng , Ji Xu , Pengyuan Zhang

In this work, we describe a novel method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by…

Computation and Language · Computer Science 2023-06-21 Woojay Jeon

This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the…

Computation and Language · Computer Science 2023-06-05 Dongji Gao , Matthew Wiesner , Hainan Xu , Leibny Paola Garcia , Daniel Povey , Sanjeev Khudanpur

In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to…

Computation and Language · Computer Science 2023-12-18 Tzu-Ting Yang , Hsin-Wei Wang , Berlin Chen

Temporal connectionist temporal classification (CTC)-based automatic speech recognition (ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the token independence assumption in decoding, an external language…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-26 Xugang Lu , Peng Shen , Yu Tsao , Hisashi Kawai

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the…

Computation and Language · Computer Science 2022-10-12 Brian Yan , Siddharth Dalmia , Yosuke Higuchi , Graham Neubig , Florian Metze , Alan W Black , Shinji Watanabe

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Ziyin Zhang , Ning Lu , Minghui Liao , Yongshuai Huang , Cheng Li , Min Wang , Wei Peng

Siamese networks have shown effective results in unsupervised visual representation learning. These models are designed to learn an invariant representation of two augmentations for one input by maximizing their similarity. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-23 Yingying Gao , Junlan Feng , Tianrui Wang , Chao Deng , Shilei Zhang

The mismatch of speech length and text length poses a challenge in automatic speech recognition (ASR). In previous research, various approaches have been employed to align text with speech, including the utilization of Connectionist…

Computation and Language · Computer Science 2025-10-14 Peng Fan , Wenping Wang , Fei Deng

Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However,…

Computation and Language · Computer Science 2017-02-28 Yisen Wang , Xuejiao Deng , Songbai Pu , Zhiheng Huang

Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR). But, training SSL models like wav2vec~2.0 requires a two-stage pipeline. In this paper we…

Computation and Language · Computer Science 2021-02-16 Chaitanya Talnikar , Tatiana Likhomanenko , Ronan Collobert , Gabriel Synnaeve
‹ Prev 1 2 3 10 Next ›