Related papers: Conformer: Convolution-augmented Transformer for S…

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Wei Han , Zhengdong Zhang , Yu Zhang , Jiahui Yu , Chung-Cheng Chiu , James Qin , Anmol Gulati , Ruoming Pang , Yonghui Wu

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels

Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to…

Computation and Language · Computer Science 2024-07-25 Darshan Prabhu , Yifan Peng , Preethi Jyothi , Shinji Watanabe

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigid kernels by the conventional CNN…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , John H. L. Hansen

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-04 Kyuhong Shim , Wonyong Sung

Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications

This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker…

Sound · Computer Science 2024-12-02 Nirmal Joshua Kapu , Raghav Karan

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Kwangyoun Kim , Felix Wu , Yifan Peng , Jing Pan , Prashant Sridhar , Kyu J. Han , Shinji Watanabe

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model. The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout…

Sound · Computer Science 2022-10-21 Yufeng Yang , Peidong Wang , DeLiang Wang

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications. However, more effort needs to be made to harmonize these two architectures effectively to satisfy speech enhancement. This paper aims to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-31 Xinmeng Xu , Weiping Tu , Yuhong Yang

Towards A Unified Conformer Structure: from ASR to ASV Task

Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-18 Dexin Liao , Tao Jiang , Feng Wang , Lin Li , Qingyang Hong

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Dima Rekesh , Nithin Rao Koluguri , Samuel Kriman , Somshubra Majumdar , Vahid Noroozi , He Huang , Oleksii Hrinchuk , Krishna Puvvada , Ankur Kumar , Jagadeesh Balam , Boris Ginsburg

Conformer LLMs -- Convolution Augmented Large Language Models

This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims…

Computation and Language · Computer Science 2023-07-04 Prateek Verma

Deep Sparse Conformer for Speech Recognition

Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two…

Computation and Language · Computer Science 2022-09-02 Xianchao Wu

Adaptive Convolution for CNN-based Speech Enhancement Models

Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Dahan Wang , Xiaobin Rong , Shiruo Sun , Yuxiang Hu , Changbao Zhu , Jing Lu

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Sehoon Kim , Amir Gholami , Albert Shaw , Nicholas Lee , Karttikeya Mangalam , Jitendra Malik , Michael W. Mahoney , Kurt Keutzer

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-14 Wenyong Huang , Wenchao Hu , Yu Ting Yeung , Xiao Chen

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation…

Computation and Language · Computer Science 2023-05-19 Yifan Peng , Kwangyoun Kim , Felix Wu , Brian Yan , Siddhant Arora , William Chen , Jiyang Tang , Suwon Shon , Prashant Sridhar , Shinji Watanabe

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it…

Machine Learning · Computer Science 2024-05-15 Mingbin Xu , Alex Jin , Sicheng Wang , Mu Su , Tim Ng , Henry Mason , Shiyi Han , Zhihong Lei , Yaqiao Deng , Zhen Huang , Mahesh Krishnamoorthy

CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-23 Zih-Ching Chen , Yu-Shun Sung , Hung-yi Lee

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are…

Computation and Language · Computer Science 2023-05-30 Florian Mai , Juan Zuluaga-Gomez , Titouan Parcollet , Petr Motlicek

Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining

Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Neena Aloysius , Geetha M , Prema Nedungadi