Related papers: Transformers for Supervised Online Continual Learn…

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Large language models (LLMs) exhibit a strong capacity for in-context learning: Given labeled examples, they can generate good predictions without parameter updates. However, many interactive settings go beyond static prediction to online…

Machine Learning · Computer Science 2026-05-12 Emile Anand , Abdullah Ateyeh , Xinyuan Cao , Max Dabagia

Transformers predicting the future. Applying attention in next-frame and time series forecasting

Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences. However, with the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms…

Machine Learning · Computer Science 2021-08-19 Radostin Cholakov , Todor Kolev

A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

Despite their effectiveness and popularity in offline or model-based reinforcement learning (RL), transformers remain underexplored in online model-free RL due to their sensitivity to training setups and model design decisions such as how…

Machine Learning · Computer Science 2025-10-16 Nikita Kachaev , Daniil Zelezetsky , Egor Cherepanov , Alexey K. Kovelev , Aleksandr I. Panov

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a…

Machine Learning · Computer Science 2019-06-04 Zihang Dai , Zhilin Yang , Yiming Yang , Jaime Carbonell , Quoc V. Le , Ruslan Salakhutdinov

Unsupervised Meta-Learning via In-Context Learning

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that…

Machine Learning · Computer Science 2025-02-11 Anna Vettoruzzo , Lorenzo Braccaioli , Joaquin Vanschoren , Marlena Nowaczyk

A Meta-Learning Perspective on Transformers for Causal Language Modeling

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view…

Machine Learning · Computer Science 2024-03-26 Xinbo Wu , Lav R. Varshney

Transformers in Vision: A Survey

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies…

Computer Vision and Pattern Recognition · Computer Science 2022-01-20 Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such…

Machine Learning · Computer Science 2023-06-14 Saidul Islam , Hanae Elmekki , Ahmed Elsebai , Jamal Bentahar , Najat Drawel , Gaith Rjoub , Witold Pedrycz

Minimal Time Series Transformer

Transformer is the state-of-the-art model for many natural language processing, computer vision, and audio analysis problems. Transformer effectively combines information from the past input and output samples in auto-regressive manner so…

Machine Learning · Computer Science 2025-03-14 Joni-Kristian Kämäräinen

A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships

Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Gracile Astlin Pereira , Muhammad Hussain

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal…

Machine Learning · Computer Science 2025-01-07 Xiwen Chen , Peijie Qiu , Wenhui Zhu , Huayu Li , Hao Wang , Aristeidis Sotiras , Yalin Wang , Abolfazl Razi

Meta-Learning Transformers to Improve In-Context Generalization

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are…

Machine Learning · Computer Science 2025-07-08 Lorenzo Braccaioli , Anna Vettoruzzo , Prabhant Singh , Joaquin Vanschoren , Mohamed-Rafik Bouguelia , Nicola Conci

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction…

Machine Learning · Computer Science 2024-05-28 Licong Lin , Yu Bai , Song Mei

Memory Efficient Continual Learning with Transformers

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is…

Computation and Language · Computer Science 2023-01-16 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

A Closer Look at In-Context Learning under Distribution Shifts

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in…

Machine Learning · Computer Science 2023-05-29 Kartik Ahuja , David Lopez-Paz

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism…

Machine Learning · Computer Science 2023-05-09 Riccardo Ughi , Eugenio Lomurno , Matteo Matteucci

Exploring Transformers for Large-Scale Speech Recognition

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition. Most studies with Transformers…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-13 Liang Lu , Changliang Liu , Jinyu Li , Yifan Gong

(How) Can Transformers Predict Pseudo-Random Numbers?

Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random…

Machine Learning · Computer Science 2025-07-10 Tao Tao , Darshil Doshi , Dayal Singh Kalra , Tianyu He , Maissam Barkeshli

General-Purpose In-Context Learning by Meta-Learning Transformers

Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock…

Machine Learning · Computer Science 2024-01-10 Louis Kirsch , James Harrison , Jascha Sohl-Dickstein , Luke Metz

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Guanglei Yang , Hao Tang , Mingli Ding , Nicu Sebe , Elisa Ricci