Related papers: Layer Specialization Underlying Compositional Reas…

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic…

Machine Learning · Computer Science 2023-10-17 Tianyu Guo , Wei Hu , Song Mei , Huan Wang , Caiming Xiong , Silvio Savarese , Yu Bai

Analyzing the Inner Workings of Transformers in Compositional Generalization

The compositional generalization abilities of neural models have been sought after for human-like linguistic competence. The popular method to evaluate such abilities is to assess the models' input-output behavior. However, that does not…

Computation and Language · Computer Science 2025-02-24 Ryoma Kumon , Hitomi Yanaka

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the…

Machine Learning · Computer Science 2026-05-07 Alexander Hsu , Zhaiming Shen , Wenjing Liao , Rongjie Lai

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the…

Computation and Language · Computer Science 2024-04-11 Aaron Mueller , Albert Webson , Jackson Petty , Tal Linzen

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby…

Machine Learning · Computer Science 2026-05-08 Chenyang Zhang , Yuan Cao

How do Transformers Learn Implicit Reasoning?

Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In…

Machine Learning · Computer Science 2025-11-07 Jiaran Ye , Zijun Yao , Zhidian Huang , Liangming Pan , Jinxin Liu , Yushi Bai , Amy Xin , Weichuan Liu , Xiaoyin Che , Lei Hou , Juanzi Li

Transformer learns the cross-task prior and regularization for in-context learning

Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context…

Machine Learning · Computer Science 2025-05-20 Fei Lu , Yue Yu

Learning Linear Regression with Low-Rank Tasks in-Context

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common…

Disordered Systems and Neural Networks · Physics 2026-04-24 Kaito Takanami , Takashi Takahashi , Yoshiyuki Kabashima

Investigating Compositional Reasoning in Time Series Foundation Models

Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed by memorizing patterns in training data, or do they…

Machine Learning · Computer Science 2025-09-11 Willa Potosnak , Cristian Challu , Mononito Goswami , Kin G. Olivares , Michał Wiliński , Nina Żukowska , Artur Dubrawski

Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures

How do neural language models acquire a language's structure when trained for next-token prediction? We address this question by deriving theoretical scaling laws for neural network performance on synthetic datasets generated by the Random…

Machine Learning · Computer Science 2025-05-13 Francesco Cagnetta , Alessandro Favero , Antonio Sclocchi , Matthieu Wyart

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored,…

Machine Learning · Computer Science 2024-09-27 Tong Yang , Yu Huang , Yingbin Liang , Yuejie Chi

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers'…

Computation and Language · Computer Science 2025-01-16 Zhongwang Zhang , Pengxiao Lin , Zhiwei Wang , Yaoyu Zhang , Zhi-Qin John Xu

Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic

Large language models (LLMs) often exhibit unexpected errors or unintended behavior, even at scale. While recent work reveals the discrepancy between LLMs and humans in skill compositions, the learning dynamics of skill compositions and the…

Machine Learning · Computer Science 2026-02-02 Xingyu Zhao , Darsh Sharma , Rheeya Uppaal , Yiqiao Zhong

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on…

Computation and Language · Computer Science 2023-03-15 Michael Hahn , Navin Goyal

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate…

Computation and Language · Computer Science 2025-03-18 Kabir Ahuja , Vidhisha Balachandran , Madhur Panwar , Tianxing He , Noah A. Smith , Navin Goyal , Yulia Tsvetkov

On the Robustness of Transformers against Context Hijacking for Linear Classification

Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a…

Computation and Language · Computer Science 2025-02-24 Tianle Li , Chenyang Zhang , Xingwu Chen , Yuan Cao , Difan Zou

Attention as a Hypernetwork

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training, but whose compositions have not. What mechanisms underlie this ability for compositional…

Machine Learning · Computer Science 2025-02-18 Simon Schug , Seijin Kobayashi , Yassir Akram , João Sacramento , Razvan Pascanu

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume…

Machine Learning · Computer Science 2024-06-19 Yue Xing , Xiaofeng Lin , Chenheng Xu , Namjoon Suh , Qifan Song , Guang Cheng

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it…

Machine Learning · Computer Science 2024-11-04 Ruifeng Ren , Yong Liu