English
Related papers

Related papers: Sumformer: Universal Approximation for Efficient T…

200 papers

Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation…

Machine Learning · Computer Science 2020-02-26 Chulhee Yun , Srinadh Bhojanapalli , Ankit Singh Rawat , Sashank J. Reddi , Sanjiv Kumar

Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences. However, with the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms…

Machine Learning · Computer Science 2021-08-19 Radostin Cholakov , Todor Kolev

Language models have emerged as a critical area of focus in artificial intelligence, particularly with the introduction of groundbreaking innovations like ChatGPT. Large-scale Transformer networks have quickly become the leading approach…

Artificial Intelligence · Computer Science 2024-12-12 Wei Wang , Qing Li

Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them…

Computation and Language · Computer Science 2019-03-06 Mostafa Dehghani , Stephan Gouws , Oriol Vinyals , Jakob Uszkoreit , Łukasz Kaiser

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

Deep learning employs multi-layer neural networks trained via the backpropagation algorithm. This approach has achieved success across many domains and relies on adaptive gradient methods such as the Adam optimizer. Sequence modeling…

Machine Learning · Computer Science 2025-07-16 Esmail Gumaan

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing…

Computation and Language · Computer Science 2023-11-01 Amanda Bertsch , Uri Alon , Graham Neubig , Matthew R. Gormley

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on…

Computation and Language · Computer Science 2021-09-07 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang , Xing Xie

The widespread 'deeper is better' philosophy has driven the creation of architectures like ResNet and Transformer, which achieve high performance by stacking numerous layers. However, increasing model depth comes with challenges such as…

Machine Learning · Computer Science 2026-02-25 Wei Wang , Xiao-Yong Wei , Qing Li

Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism…

Computation and Language · Computer Science 2020-12-03 Iz Beltagy , Matthew E. Peters , Arman Cohan

The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its…

Machine Learning · Computer Science 2025-08-29 Zhongpan Tang

Transformers have become pivotal in Natural Language Processing, demonstrating remarkable success in applications like Machine Translation and Summarization. Given their widespread adoption, several works have attempted to analyze the…

Machine Learning · Computer Science 2024-09-02 Swaroop Nath , Harshad Khadilkar , Pushpak Bhattacharyya

We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with advantages in parallel computation and capturing contexts.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-30 Ching-Feng Yeh , Jay Mahadeokar , Kaustubh Kalgaonkar , Yongqiang Wang , Duc Le , Mahaveer Jain , Kjell Schubert , Christian Fuegen , Michael L. Seltzer

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism…

Machine Learning · Computer Science 2023-05-09 Riccardo Ughi , Eugenio Lomurno , Matteo Matteucci

Transformer-based language models have revolutionized the field of natural language processing (NLP). However, using these models often involves navigating multiple frameworks and tools, as well as writing repetitive boilerplate code. This…

Computation and Language · Computer Science 2025-04-15 Rabindra Lamsal , Maria Rodriguez Read , Shanika Karunasekera

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. To…

Computation and Language · Computer Science 2023-07-20 Jiayu Ding , Shuming Ma , Li Dong , Xingxing Zhang , Shaohan Huang , Wenhui Wang , Nanning Zheng , Furu Wei

Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal…

Machine Learning · Computer Science 2025-01-07 Xiwen Chen , Peijie Qiu , Wenhui Zhu , Huayu Li , Hao Wang , Aristeidis Sotiras , Yalin Wang , Abolfazl Razi

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences,…

Machine Learning · Computer Science 2020-06-16 Sinong Wang , Belinda Z. Li , Madian Khabsa , Han Fang , Hao Ma

A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it…

Computation and Language · Computer Science 2024-05-24 Victor Agostinelli , Sanghyun Hong , Lizhong Chen
‹ Prev 1 2 3 10 Next ›