English
Related papers

Related papers: An Asynchronous WFST-Based Decoder For Automatic S…

200 papers

Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap…

Computation and Language · Computer Science 2025-06-04 Guitao Wang , Jinming Zhao , Hao Yang , Guilin Qi , Tongtong Wu , Gholamreza Haffari

We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized language models. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A…

Computation and Language · Computer Science 2019-10-24 Jun Liu , Jiedan Zhu , Vishal Kathuria , Fuchun Peng

In this work we propose an inference technique, asynchronous revision, to unify streaming and non-streaming speech recognition models. Specifically, we achieve dynamic latency with only one model by using arbitrary right context during…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-04 Mingkun Huang , Meng Cai , Jun Zhang , Yang Zhang , Yongbin You , Yi He , Zejun Ma

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu

The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts…

Computer Vision and Pattern Recognition · Computer Science 2020-07-20 Xiaoyu Yue , Zhanghui Kuang , Chenhao Lin , Hongbin Sun , Wayne Zhang

For most of the attention-based sequence-to-sequence models, the decoder predicts the output sequence conditioned on the entire input sequence processed by the encoder. The asynchronous problem between the encoding and decoding makes these…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-25 Zhengkun Tian , Jiangyan Yi , Ye Bai , Jianhua Tao , Shuai Zhang , Zhengqi Wen

We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et…

Computation and Language · Computer Science 2020-11-21 Hang Le , Juan Pino , Changhan Wang , Jiatao Gu , Didier Schwab , Laurent Besacier

In speech separation, time-domain approaches have successfully replaced the time-frequency domain with latent sequence feature from a learnable encoder. Conventionally, the feature is separated into speaker-specific ones at the final stage…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-01 Ui-Hyeop Shin , Sangyoun Lee , Taehan Kim , Hyung-Min Park

End-to-end automatic speech recognition has become the dominant paradigm in both academia and industry. To enhance recognition performance, the Weighted Finite-State Transducer (WFST) is widely adopted to integrate acoustic and language…

Sound · Computer Science 2026-01-05 Zhuoran Zhuang , Ye Chen , Chao Luo , Tian-Hao Zhang , Xuewei Zhang , Jian Ma , Jiatong Shi , Wei Zhang

We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but…

Computation and Language · Computer Science 2014-12-09 Awni Y. Hannun , Andrew L. Maas , Daniel Jurafsky , Andrew Y. Ng

Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the…

Computation and Language · Computer Science 2021-10-07 Shubho Sengupta , Vineel Pratap , Awni Hannun

In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as…

Machine Learning · Computer Science 2021-08-30 Chanwoo Kim , Dhananjaya Gowda , Dongsoo Lee , Jiyeon Kim , Ankur Kumar , Sungsoo Kim , Abhinav Garg , Changwoo Han

Current Audio-Visual Source Separation methods primarily adopt two design strategies. The first strategy involves fusing audio and visual features at the bottleneck layer of the encoder, followed by processing the fused features through the…

Sound · Computer Science 2025-05-01 Yinfeng Yu , Shiyu Sun

In end-to-end speech translation, acoustic representations learned by the encoder are usually fixed and static, from the perspective of the decoder, which is not desirable for dealing with the cross-modal and cross-lingual challenge in…

Computation and Language · Computer Science 2025-03-19 Wuwei Huang , Dexin Wang , Deyi Xiong

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a…

Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal…

Computation and Language · Computer Science 2023-10-12 Qingkai Fang , Yan Zhou , Yang Feng

Standard decoders for neural machine translation autoregressively generate a single target token per time step, which slows inference especially for long outputs. While architectural advances such as the Transformer fully parallelize the…

Computation and Language · Computer Science 2020-10-06 Nader Akoury , Kalpesh Krishna , Mohit Iyyer

LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load,…

Computation and Language · Computer Science 2020-10-23 Xie Chen , Sarangarajan Parthasarathy , William Gale , Shuangyu Chang , Michael Zeng

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The…

Computation and Language · Computer Science 2015-03-31 Matthew Ager , Zoran Cvetkovic , Peter Sollich
‹ Prev 1 2 3 10 Next ›