Related papers: Resource-Efficient Separation Transformer

Exploring Self-Attention Mechanisms for Speech Separation

Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-30 Cem Subakan , Mirco Ravanelli , Samuele Cornell , Francois Grondin , Mirko Bronzi

Attention is All You Need in Speech Separation

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-10 Cem Subakan , Mirco Ravanelli , Samuele Cornell , Mirko Bronzi , Jianyuan Zhong

Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Time-domain Transformer neural networks have proven their superiority in speech separation tasks. However, these models usually have a large number of network parameters, thus often encountering the problem of GPU memory explosion. In this…

Sound · Computer Science 2022-07-01 Jian Luo , Jianzong Wang , Ning Cheng , Edward Xiao , Xulong Zhang , Jing Xiao

FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer

Speech separation always faces the challenge of handling prolonged time sequences. Past methods try to reduce sequence lengths and use the Transformer to capture global information. However, due to the quadratic time complexity of the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Haoxu Wang , Yiheng Jiang , Gang Qiao , Pengteng Shi , Biao Tian

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-28 Sanyuan Chen , Yu Wu , Zhuo Chen , Jian Wu , Takuya Yoshioka , Shujie Liu , Jinyu Li , Xiangzhan Yu

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over…

Sound · Computer Science 2021-02-22 Zining Zhang , Bingsheng He , Zhenjie Zhang

Papez: Resource-Efficient Speech Separation with Auditory Working Memory

Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present…

Sound · Computer Science 2024-07-02 Hyunseok Oh , Juheon Yi , Youngki Lee

Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

The SepFormer architecture shows very good results in speech separation. Like other learned-encoder models, it uses short frames, as they have been shown to obtain better performance in these cases. This results in a large number of frames…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Danilo de Oliveira , Tal Peer , Timo Gerkmann

Do we really need Self-Attention for Streaming Automatic Speech Recognition?

Transformer-based architectures are the most used architectures in many deep learning fields like Natural Language Processing, Computer Vision or Speech processing. It may encourage the direct use of Transformers in the constrained tasks,…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Youness Dkhissi , Valentin Vielzeuf , Elys Allesiardo , Anthony Larcher

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals

Transformers have achieved great success in effectively processing sequential data such as text. Their architecture consisting of several attention and feedforward blocks can model relations between elements of a sequence in parallel…

Machine Learning · Computer Science 2025-02-20 Jaemu Heo , Eldor Fozilov , Hyunmin Song , Taehwan Kim

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

End-to-end automatic speech recognition (ASR), unlike conventional ASR, does not have modules to learn the semantic representation from speech encoder. Moreover, the higher frame-rate of speech representation prevents the model to learn the…

Artificial Intelligence · Computer Science 2021-03-19 Md Akmal Haidar , Chao Xing , Mehdi Rezagholizadeh

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.…

Sound · Computer Science 2020-10-26 Sanyuan Chen , Yu Wu , Zhuo Chen , Takuya Yoshioka , Shujie Liu , Jinyu Li

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained…

Computation and Language · Computer Science 2023-10-24 Yinghan Long , Sayeed Shafayet Chowdhury , Kaushik Roy

ResFormer: All-Time Reservoir Memory for Long Sequence Classification

Sequence classification is essential in NLP for understanding and categorizing language patterns in tasks like sentiment analysis, intent detection, and topic classification. Transformer-based models, despite achieving state-of-the-art…

Computation and Language · Computer Science 2025-09-30 Hongbo Liu , Jia Xu

Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important.…

Sound · Computer Science 2023-03-08 Zhaoxi Mu , Xinyu Yang , Wenjing Zhu

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a…

Sound · Computer Science 2022-03-11 Weidong Chen , Xiaofen Xing , Xiangmin Xu , Jianxin Pang , Lan Du

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the…

Sound · Computer Science 2022-07-05 Kun Wei , Pengcheng Guo , Ning Jiang

Speechformer: Reducing Information Loss in Direct Speech Translation

Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer's quadratic complexity with respect to the input sequence length…

Computation and Language · Computer Science 2023-10-19 Sara Papi , Marco Gaido , Matteo Negri , Marco Turchi

ReduceFormer: Attention with Tensor Reduction by Summation

Transformers have excelled in many tasks including vision. However, efficient deployment of transformer models in low-latency or high-throughput applications is hindered by the computation in the attention mechanism which involves expensive…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 John Yang , Le An , Su Inn Park

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute…

Computation and Language · Computer Science 2025-02-04 Gabriel Lindenmaier , Sean Papay , Sebastian Padó