English
Related papers

Related papers: Learning Spectral Methods by Transformers

200 papers

In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued…

Machine Learning · Computer Science 2023-10-05 Satwik Bhattamishra , Arkil Patel , Phil Blunsom , Varun Kanade

The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the…

Machine Learning · Computer Science 2026-02-10 Zhiheng Chen , Ruofan Wu , Guanhua Fang

The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with…

Machine Learning · Computer Science 2024-10-14 Khashayar Gatmiry , Nikunj Saunshi , Sashank J. Reddi , Stefanie Jegelka , Sanjiv Kumar

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is…

Computation and Language · Computer Science 2023-01-16 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

In this work we propose for the first time a transformer-based framework for unsupervised representation learning of multivariate time series. Pre-trained models can be potentially used for downstream tasks such as regression and…

Machine Learning · Computer Science 2020-12-10 George Zerveas , Srideepika Jayaraman , Dhaval Patel , Anuradha Bhamidipaty , Carsten Eickhoff

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that…

Machine Learning · Computer Science 2025-02-11 Anna Vettoruzzo , Lorenzo Braccaioli , Joaquin Vanschoren , Marlena Nowaczyk

Transformer based models have shown remarkable capabilities in sequence learning across a wide range of tasks, often performing well on specific task by leveraging input-output examples. Despite their empirical success, a comprehensive…

Machine Learning · Computer Science 2025-06-03 Yifan Hao , Chenlu Ye , Chi Han , Tong Zhang

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the…

Machine Learning · Computer Science 2024-12-16 Spencer Frei , Gal Vardi

Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this…

Machine Learning · Computer Science 2023-11-03 Steve Yadlowsky , Lyric Doshi , Nilesh Tripuraneni

Learning features from massive unlabelled data is a vast prevalent topic for high-level tasks in many machine learning applications. The recent great improvements on benchmark data sets achieved by increasingly complex unsupervised learning…

Neural and Evolutionary Computing · Computer Science 2015-09-29 Wentao Zhu , Jun Miao , Laiyun Qing , Xilin Chen

In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use…

Machine Learning · Computer Science 2024-07-18 Daniel Marczak , Sebastian Cygert , Tomasz Trzciński , Bartłomiej Twardowski

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of…

Machine Learning · Computer Science 2023-11-13 Kwangjun Ahn , Xiang Cheng , Hadi Daneshmand , Suvrit Sra

Transformers have demonstrated remarkable success across various applications. However, the success of transformers have not been understood in theory. In this work, we give a case study of how transformers can be trained to learn a classic…

Machine Learning · Statistics 2025-04-14 Chenyang Zhang , Xuran Meng , Yuan Cao

Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the entire network. But one distinct advantage of unsupervised over supervised…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Zefan Li , Chenxi Liu , Alan Yuille , Bingbing Ni , Wenjun Zhang , Wen Gao

Transformers have achieved great success across a wide range of applications, yet the theoretical foundations underlying their success remain largely unexplored. To demystify the strong capacities of transformers applied to versatile…

Machine Learning · Computer Science 2026-03-25 Chenyang Zhang , Qingyue Zhao , Quanquan Gu , Yuan Cao

Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce…

Machine Learning · Computer Science 2025-01-27 Qi Sun , Edoardo Cetin , Yujin Tang

In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer…

Machine Learning · Computer Science 2025-02-26 William L. Tong , Cengiz Pehlevan

In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher…

Machine Learning · Computer Science 2023-11-06 Lorenzo Giambagli , Lorenzo Buffoni , Lorenzo Chicchi , Duccio Fanelli

Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random…

Machine Learning · Computer Science 2025-07-10 Tao Tao , Darshil Doshi , Dayal Singh Kalra , Tianyu He , Maissam Barkeshli

Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization…

Machine Learning · Computer Science 2024-03-06 Angeliki Giannou , Liu Yang , Tianhao Wang , Dimitris Papailiopoulos , Jason D. Lee
‹ Prev 1 2 3 10 Next ›