English
Related papers

Related papers: Multi-Convformer: Extending Conformer with Multipl…

200 papers

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , Yonghui Wu , Ruoming Pang

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Kwangyoun Kim , Felix Wu , Yifan Peng , Jing Pan , Prashant Sridhar , Kyu J. Han , Shinji Watanabe

This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims…

Computation and Language · Computer Science 2023-07-04 Prateek Verma

In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-implement, simple but effective backbone for automatic speaker verification based on the Convolution-augmented Transformer (Conformer). The…

Sound · Computer Science 2022-11-14 Yang Zhang , Zhiqiang Lv , Haibin Wu , Shanshan Zhang , Pengfei Hu , Zhiyong Wu , Hung-yi Lee , Helen Meng

Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible,…

Computation and Language · Computer Science 2022-07-08 Yifan Peng , Siddharth Dalmia , Ian Lane , Shinji Watanabe

Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-18 Dexin Liao , Tao Jiang , Feng Wang , Lin Li , Qingyang Hong

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigid kernels by the conventional CNN…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , John H. L. Hansen

State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are…

Computation and Language · Computer Science 2023-05-30 Florian Mai , Juan Zuluaga-Gomez , Titouan Parcollet , Petr Motlicek

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder. To…

Computation and Language · Computer Science 2023-10-17 Can Cui , Imran Ahamad Sheikh , Mostafa Sadeghi , Emmanuel Vincent

This paper addresses end-to-end automatic speech recognition (ASR) for long audio recordings such as lecture and conversational speeches. Most end-to-end ASR models are designed to recognize independent utterances, but contextual…

Computation and Language · Computer Science 2021-04-20 Takaaki Hori , Niko Moritz , Chiori Hori , Jonathan Le Roux

Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two…

Computation and Language · Computer Science 2022-09-02 Xianchao Wu

This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-01 Georgios Paraskevopoulos , Srinivas Parthasarathy , Aparna Khare , Shiva Sundaram

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation…

Computation and Language · Computer Science 2023-05-19 Yifan Peng , Kwangyoun Kim , Felix Wu , Brian Yan , Siddhant Arora , William Chen , Jiyang Tang , Suwon Shon , Prashant Sridhar , Shinji Watanabe

The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-18 Sehoon Kim , Amir Gholami , Albert Shaw , Nicholas Lee , Karttikeya Mangalam , Jitendra Malik , Michael W. Mahoney , Kurt Keutzer

Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Dahan Wang , Xiaobin Rong , Shiruo Sun , Yuxiang Hu , Changbao Zhu , Jing Lu

In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a…

Conformer models have achieved state-of-the-art(SOTA) results in end-to-end speech recognition. However Conformer mainly focuses on temporal modeling while pays less attention on time-frequency property of speech feature. In this paper we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-01 Yongjun Jiang , Jian Yu , Wenwen Yang , Bihong Zhang , Yanfeng Wang

Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information…

Computation and Language · Computer Science 2022-12-02 Xiaoming Ren , Huifeng Zhu , Liuwei Wei , Minghui Wu , Jie Hao

Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech…

Sound · Computer Science 2024-05-07 Ruizhe Cao , Sherif Abdulatif , Bin Yang
‹ Prev 1 2 3 10 Next ›