English
Related papers

Related papers: Extended Mind Transformers

200 papers

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus…

Machine Learning · Computer Science 2022-03-18 Yuhuai Wu , Markus N. Rabe , DeLesley Hutchins , Christian Szegedy

Memory is fundamental to intelligence, enabling learning, reasoning, and adaptability across biological and artificial systems. While Transformer architectures excel at sequence modeling, they face critical limitations in long-range context…

Machine Learning · Computer Science 2025-08-19 Parsa Omidi , Xingshuai Huang , Axel Laborieux , Bahareh Nikpour , Tianyu Shi , Armaghan Eshaghi

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is…

Computation and Language · Computer Science 2023-01-16 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of…

Computation and Language · Computer Science 2020-12-02 Chen Zhu , Ankit Singh Rawat , Manzil Zaheer , Srinadh Bhojanapalli , Daliang Li , Felix Yu , Sanjiv Kumar

The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations. Memory-augmentation, or the explicit storing of past information in external…

Computation and Language · Computer Science 2022-11-29 Omri Raccah , Phoebe Chen , Ted L. Willke , David Poeppel , Vy A. Vo

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel…

Computation and Language · Computer Science 2023-05-24 Qingyang Wu , Zhou Yu

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…

Computation and Language · Computer Science 2026-03-24 Hadi Pouransari , David Grangier , C Thomas , Michael Kirchhof , Oncel Tuzel

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model…

Computation and Language · Computer Science 2022-04-27 Haozhe Ji , Rongsheng Zhang , Zhenyu Yang , Zhipeng Hu , Minlie Huang

Transformers have revolutionized deep learning in numerous fields, including natural language processing, computer vision, and audio processing. Their strength lies in their attention mechanism, which allows for the discovering of complex…

Machine Learning · Computer Science 2024-04-02 Uladzislau Yorsh , Martin Holeňa , Ondřej Bojar , David Herel

Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a…

Computation and Language · Computer Science 2022-11-01 Yuxiang Wu , Yu Zhao , Baotian Hu , Pasquale Minervini , Pontus Stenetorp , Sebastian Riedel

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network…

Computation and Language · Computer Science 2022-04-14 Qingyang Wu , Zhenzhong Lan , Kun Qian , Jing Gu , Alborz Geramifard , Zhou Yu

Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory…

Computation and Language · Computer Science 2023-01-24 Wenhu Chen , Pat Verga , Michiel de Jong , John Wieting , William Cohen

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, paradoxically, help in learning language -…

Computation and Language · Computer Science 2026-05-11 Abishek Thamma , Micha Heilbron

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-29 Shucong Zhang , Erfan Loweimi , Peter Bell , Steve Renals

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then…

Computation and Language · Computer Science 2023-10-25 Sunit Bhattacharya , Ondrej Bojar

Despite the empirical success of prompt tuning in adapting pretrained language models to new tasks, theoretical analyses of its capabilities remain limited. Existing theoretical work primarily addresses universal approximation properties,…

Machine Learning · Computer Science 2025-09-03 Maxime Meyer , Mario Michelessa , Caroline Chaux , Vincent Y. F. Tan
‹ Prev 1 2 3 10 Next ›