English
Related papers

Related papers: Layer Specialization Underlying Compositional Reas…

200 papers

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic…

Machine Learning · Computer Science 2023-10-17 Tianyu Guo , Wei Hu , Song Mei , Huan Wang , Caiming Xiong , Silvio Savarese , Yu Bai

The compositional generalization abilities of neural models have been sought after for human-like linguistic competence. The popular method to evaluate such abilities is to assess the models' input-output behavior. However, that does not…

Computation and Language · Computer Science 2025-02-24 Ryoma Kumon , Hitomi Yanaka

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the…

Machine Learning · Computer Science 2026-05-07 Alexander Hsu , Zhaiming Shen , Wenjing Liao , Rongjie Lai

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the…

Computation and Language · Computer Science 2024-04-11 Aaron Mueller , Albert Webson , Jackson Petty , Tal Linzen

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby…

Machine Learning · Computer Science 2026-05-08 Chenyang Zhang , Yuan Cao

Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In…

Machine Learning · Computer Science 2025-11-07 Jiaran Ye , Zijun Yao , Zhidian Huang , Liangming Pan , Jinxin Liu , Yushi Bai , Amy Xin , Weichuan Liu , Xiaoyin Che , Lei Hou , Juanzi Li

Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context…

Machine Learning · Computer Science 2025-05-20 Fei Lu , Yue Yu

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common…

Disordered Systems and Neural Networks · Physics 2026-04-24 Kaito Takanami , Takashi Takahashi , Yoshiyuki Kabashima

Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed by memorizing patterns in training data, or do they…

How do neural language models acquire a language's structure when trained for next-token prediction? We address this question by deriving theoretical scaling laws for neural network performance on synthetic datasets generated by the Random…

Machine Learning · Computer Science 2025-05-13 Francesco Cagnetta , Alessandro Favero , Antonio Sclocchi , Matthieu Wyart

In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored,…

Machine Learning · Computer Science 2024-09-27 Tong Yang , Yu Huang , Yingbin Liang , Yuejie Chi

Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers'…

Computation and Language · Computer Science 2025-01-16 Zhongwang Zhang , Pengxiao Lin , Zhiwei Wang , Yaoyu Zhang , Zhi-Qin John Xu

Large language models (LLMs) often exhibit unexpected errors or unintended behavior, even at scale. While recent work reveals the discrepancy between LLMs and humans in skill compositions, the learning dynamics of skill compositions and the…

Machine Learning · Computer Science 2026-02-02 Xingyu Zhao , Darsh Sharma , Rheeya Uppaal , Yiqiao Zhong

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on…

Computation and Language · Computer Science 2023-03-15 Michael Hahn , Navin Goyal

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate…

Computation and Language · Computer Science 2025-03-18 Kabir Ahuja , Vidhisha Balachandran , Madhur Panwar , Tianxing He , Noah A. Smith , Navin Goyal , Yulia Tsvetkov

Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a…

Computation and Language · Computer Science 2025-02-24 Tianle Li , Chenyang Zhang , Xingwu Chen , Yuan Cao , Difan Zou

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training, but whose compositions have not. What mechanisms underlie this ability for compositional…

Machine Learning · Computer Science 2025-02-18 Simon Schug , Seijin Kobayashi , Yassir Akram , João Sacramento , Razvan Pascanu

Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume…

Machine Learning · Computer Science 2024-06-19 Yue Xing , Xiaofeng Lin , Chenheng Xu , Namjoon Suh , Qifan Song , Guang Cheng

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it…

Machine Learning · Computer Science 2024-11-04 Ruifeng Ren , Yong Liu
‹ Prev 1 2 3 10 Next ›