English
Related papers

Related papers: Layer Reduction: Accelerating Conformer-Based Self…

200 papers

Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate…

Machine Learning · Computer Science 2022-02-07 Lillian Zhou , Dhruv Guliani , Andreas Kabel , Giovanni Motta , Françoise Beaufays

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current…

Machine Learning · Computer Science 2020-10-27 Minjia Zhang , Yuxiong He

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-31 Zih-Ching Chen , Chin-Lun Fu , Chih-Ying Liu , Shang-Wen Li , Hung-yi Lee

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Self-supervised speech representation models, particularly those leveraging transformer architectures, have demonstrated remarkable performance across various tasks such as speech recognition, speaker identification, and emotion detection.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-20 Teresa Dorszewski , Albert Kjøller Jacobsen , Lenka Tětková , Lars Kai Hansen

While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose…

Computation and Language · Computer Science 2023-05-30 Chen Xu , Yuhao Zhang , Chengbo Jiao , Xiaoqian Liu , Chi Hu , Xin Zeng , Tong Xiao , Anxiang Ma , Huizhen Wang , JingBo Zhu

Large, self-supervised transformer-based language representation models have recently received significant amounts of attention, and have produced state-of-the-art results across a variety of tasks simply by scaling up pre-training on…

Computation and Language · Computer Science 2019-10-25 Alexandre Matton , Luke de Oliveira

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Large language models (LLMs) achieve remarkable performance through ever-increasing parameter counts, but scaling incurs steep computational costs. To better understand LLM scaling, we study representational differences between LLMs and…

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-23 Zih-Ching Chen , Yu-Shun Sung , Hung-yi Lee

End-to-end automatic speech recognition (ASR), unlike conventional ASR, does not have modules to learn the semantic representation from speech encoder. Moreover, the higher frame-rate of speech representation prevents the model to learn the…

Artificial Intelligence · Computer Science 2021-03-19 Md Akmal Haidar , Chao Xing , Mehdi Rezagholizadeh

Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server…

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream…

Machine Learning · Computer Science 2022-11-08 Zhouyuan Huo , Khe Chai Sim , Bo Li , Dongseong Hwang , Tara N. Sainath , Trevor Strohman

Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-28 Sanyuan Chen , Yu Wu , Zhuo Chen , Jian Wu , Takuya Yoshioka , Shujie Liu , Jinyu Li , Xiangzhan Yu

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , Yonghui Wu , Ruoming Pang
‹ Prev 1 2 3 10 Next ›