English
Related papers

Related papers: Toeplitz Neural Network for Sequence Modeling

200 papers

Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand,…

Computation and Language · Computer Science 2023-11-16 Zhen Qin , Yiran Zhong

In Natural Language Processing (NLP), it is important to detect the relationship between two sequences or to generate a sequence of tokens given another observed sequence. We call the type of problems on modelling sequence pairs as sequence…

Computation and Language · Computer Science 2018-10-26 Lei Yu

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to…

Computation and Language · Computer Science 2017-07-26 Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , Yann N. Dauphin

Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a large number of NLP tasks, can still suffer from overfitting during training. In practice, this is usually countered either by applying…

Computation and Language · Computer Science 2022-01-04 Dušan Variš , Ondřej Bojar

As the demand for processing extended textual data grows, the ability to handle long-range dependencies and maintain computational efficiency is more critical than ever. One of the key issues for long-sequence modeling using attention-based…

Computation and Language · Computer Science 2025-05-26 Aosong Feng , Rex Ying , Leandros Tassiulas

Transformers achieve remarkable performance in various domains, including NLP, CV, audio processing, and graph analysis. However, they do not scale well on long sequence tasks due to their quadratic complexity w.r.t. the inputs length.…

Machine Learning · Computer Science 2022-02-24 Maksim Zubkov , Daniil Gavrilov

The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original…

Machine Learning · Computer Science 2021-11-04 Shengjie Luo , Shanda Li , Tianle Cai , Di He , Dinglan Peng , Shuxin Zheng , Guolin Ke , Liwei Wang , Tie-Yan Liu

A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it…

Computation and Language · Computer Science 2024-05-24 Victor Agostinelli , Sanghyun Hong , Lizhong Chen

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network…

Computation and Language · Computer Science 2022-04-14 Qingyang Wu , Zhenzhong Lan , Kun Qian , Jing Gu , Alborz Geramifard , Zhou Yu

The Recurrent Neural Networks and their variants have shown promising performances in sequence modeling tasks such as Natural Language Processing. These models, however, turn out to be impractical and difficult to train when exposed to very…

Computer Vision and Pattern Recognition · Computer Science 2017-07-07 Yinchong Yang , Denis Krompass , Volker Tresp

Transformer-based models have shown strong performance in time-series forecasting by leveraging self-attention to model long-range temporal dependencies. However, their effectiveness depends critically on the quality and structure of input…

Machine Learning · Computer Science 2026-02-11 Saurish Nagrath , Saroj Kumar Panigrahy

A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In…

Computation and Language · Computer Science 2021-09-09 Peng Chen

Increasing the input length has been a driver of progress in language modeling with transformers. We identify conditions where shorter inputs are not harmful, and achieve perplexity and efficiency improvements through two new methods that…

Computation and Language · Computer Science 2021-06-04 Ofir Press , Noah A. Smith , Mike Lewis

Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal…

Machine Learning · Computer Science 2025-01-07 Xiwen Chen , Peijie Qiu , Wenhui Zhu , Huayu Li , Hao Wang , Aristeidis Sotiras , Yalin Wang , Abolfazl Razi

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. To…

Computation and Language · Computer Science 2023-07-20 Jiayu Ding , Shuming Ma , Li Dong , Xingxing Zhang , Shaohan Huang , Wenhui Wang , Nanning Zheng , Furu Wei

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-31 Zongwu Wang , Fangxin Liu , Mingshuai Li , Li Jiang

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little…

Computation and Language · Computer Science 2018-09-06 Daniel Watson , Nasser Zalmout , Nizar Habash

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to…

Neural and Evolutionary Computing · Computer Science 2012-11-16 Alex Graves

Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This…

Computation and Language · Computer Science 2019-06-14 Ben Peters , Vlad Niculae , André F. T. Martins
‹ Prev 1 2 3 10 Next ›