Related papers: Learning Spectral Methods by Transformers

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued…

Machine Learning · Computer Science 2023-10-05 Satwik Bhattamishra , Arkil Patel , Phil Blunsom , Varun Kanade

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the…

Machine Learning · Computer Science 2026-02-10 Zhiheng Chen , Ruofan Wu , Guanhua Fang

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with…

Machine Learning · Computer Science 2024-10-14 Khashayar Gatmiry , Nikunj Saunshi , Sashank J. Reddi , Stefanie Jegelka , Sanjiv Kumar

Memory Efficient Continual Learning with Transformers

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is…

Computation and Language · Computer Science 2023-01-16 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

A Transformer-based Framework for Multivariate Time Series Representation Learning

In this work we propose for the first time a transformer-based framework for unsupervised representation learning of multivariate time series. Pre-trained models can be potentially used for downstream tasks such as regression and…

Machine Learning · Computer Science 2020-12-10 George Zerveas , Srideepika Jayaraman , Dhaval Patel , Anuradha Bhamidipaty , Carsten Eickhoff

Unsupervised Meta-Learning via In-Context Learning

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that…

Machine Learning · Computer Science 2025-02-11 Anna Vettoruzzo , Lorenzo Braccaioli , Joaquin Vanschoren , Marlena Nowaczyk

Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models

Transformer based models have shown remarkable capabilities in sequence learning across a wide range of tasks, often performing well on specific task by leveraging input-output examples. Despite their empirical success, a comprehensive…

Machine Learning · Computer Science 2025-06-03 Yifan Hao , Chenlu Ye , Chi Han , Tong Zhang

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the…

Machine Learning · Computer Science 2024-12-16 Spencer Frei , Gal Vardi

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this…

Machine Learning · Computer Science 2023-11-03 Steve Yadlowsky , Lyric Doshi , Nilesh Tripuraneni

Deep Trans-layer Unsupervised Networks for Representation Learning

Learning features from massive unlabelled data is a vast prevalent topic for high-level tasks in many machine learning applications. The recent great improvements on benchmark data sets achieved by increasingly complex unsupervised learning…

Neural and Evolutionary Computing · Computer Science 2015-09-29 Wentao Zhu , Jun Miao , Laiyun Qing , Xilin Chen

Revisiting Supervision for Continual Representation Learning

In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use…

Machine Learning · Computer Science 2024-07-18 Daniel Marczak , Sebastian Cygert , Tomasz Trzciński , Bartłomiej Twardowski

Transformers learn to implement preconditioned gradient descent for in-context learning

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of…

Machine Learning · Computer Science 2023-11-13 Kwangjun Ahn , Xiang Cheng , Hadi Daneshmand , Suvrit Sra

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Transformers have demonstrated remarkable success across various applications. However, the success of transformers have not been understood in theory. In this work, we give a case study of how transformers can be trained to learn a classic…

Machine Learning · Statistics 2025-04-14 Chenyang Zhang , Xuran Meng , Yuan Cao

Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement

Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the entire network. But one distinct advantage of unsupervised over supervised…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Zefan Li , Chenxi Liu , Alan Yuille , Bingbing Ni , Wenjun Zhang , Wen Gao

Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models

Transformers have achieved great success across a wide range of applications, yet the theoretical foundations underlying their success remain largely unexplored. To demystify the strong capacities of transformers applied to versatile…

Machine Learning · Computer Science 2026-03-25 Chenyang Zhang , Qingyue Zhao , Quanquan Gu , Yuan Cao

Transformer-Squared: Self-adaptive LLMs

Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce…

Machine Learning · Computer Science 2025-01-27 Qi Sun , Edoardo Cetin , Yujin Tang

MLPs Learn In-Context on Regression and Classification Tasks

In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer…

Machine Learning · Computer Science 2025-02-26 William L. Tong , Cengiz Pehlevan

How a student becomes a teacher: learning and forgetting through Spectral methods

In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher…

Machine Learning · Computer Science 2023-11-06 Lorenzo Giambagli , Lorenzo Buffoni , Lorenzo Chicchi , Duccio Fanelli

(How) Can Transformers Predict Pseudo-Random Numbers?

Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random…

Machine Learning · Computer Science 2025-07-10 Tao Tao , Darshil Doshi , Dayal Singh Kalra , Tianyu He , Maissam Barkeshli

How Well Can Transformers Emulate In-context Newton's Method?

Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization…

Machine Learning · Computer Science 2024-03-06 Angeliki Giannou , Liu Yang , Tianhao Wang , Dimitris Papailiopoulos , Jason D. Lee