English
Related papers

Related papers: Conformer: Convolution-augmented Transformer for S…

200 papers

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Wei Han , Zhengdong Zhang , Yu Zhang , Jiahui Yu , Chung-Cheng Chiu , James Qin , Anmol Gulati , Ruoming Pang , Yonghui Wu

Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to…

Computation and Language · Computer Science 2024-07-25 Darshan Prabhu , Yifan Peng , Preethi Jyothi , Shinji Watanabe

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigid kernels by the conventional CNN…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , John H. L. Hansen

Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-04 Kyuhong Shim , Wonyong Sung

This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker…

Sound · Computer Science 2024-12-02 Nirmal Joshua Kapu , Raghav Karan

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Kwangyoun Kim , Felix Wu , Yifan Peng , Jing Pan , Prashant Sridhar , Kyu J. Han , Shinji Watanabe

This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model. The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout…

Sound · Computer Science 2022-10-21 Yufeng Yang , Peidong Wang , DeLiang Wang

Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications. However, more effort needs to be made to harmonize these two architectures effectively to satisfy speech enhancement. This paper aims to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-31 Xinmeng Xu , Weiping Tu , Yuhong Yang

Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-18 Dexin Liao , Tao Jiang , Feng Wang , Lin Li , Qingyang Hong

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a…

This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims…

Computation and Language · Computer Science 2023-07-04 Prateek Verma

Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two…

Computation and Language · Computer Science 2022-09-02 Xianchao Wu

Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Dahan Wang , Xiaobin Rong , Shiruo Sun , Yuxiang Hu , Changbao Zhu , Jing Lu

The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Sehoon Kim , Amir Gholami , Albert Shaw , Nicholas Lee , Karttikeya Mangalam , Jitendra Malik , Michael W. Mahoney , Kurt Keutzer

Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-14 Wenyong Huang , Wenchao Hu , Yu Ting Yeung , Xiao Chen

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation…

Computation and Language · Computer Science 2023-05-19 Yifan Peng , Kwangyoun Kim , Felix Wu , Brian Yan , Siddhant Arora , William Chen , Jiyang Tang , Suwon Shon , Prashant Sridhar , Shinji Watanabe

With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it…

Machine Learning · Computer Science 2024-05-15 Mingbin Xu , Alex Jin , Sicheng Wang , Mu Su , Tim Ng , Henry Mason , Shiyi Han , Zhihong Lei , Yaqiao Deng , Zhen Huang , Mahesh Krishnamoorthy

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-23 Zih-Ching Chen , Yu-Shun Sung , Hung-yi Lee

State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are…

Computation and Language · Computer Science 2023-05-30 Florian Mai , Juan Zuluaga-Gomez , Titouan Parcollet , Petr Motlicek

Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Neena Aloysius , Geetha M , Prema Nedungadi
‹ Prev 1 2 3 10 Next ›