Related papers: How does Multi-Task Training Affect Transformer In…

Re-examining learning linear functions in context

In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled…

Machine Learning · Computer Science 2025-09-03 Omar Naim , Guilhem Fouilhé , Nicholas Asher

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Large language models (LLMs) like GPT-4 and LLaMA-3 utilize the powerful in-context learning (ICL) capability of Transformer architecture to learn on the fly from limited examples. While ICL underpins many LLM applications, its full…

Machine Learning · Computer Science 2025-03-21 Xingxuan Zhang , Haoran Wang , Jiansheng Li , Yuan Xue , Shikai Guan , Renzhe Xu , Hao Zou , Han Yu , Peng Cui

In-Context Learning Creates Task Vectors

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine…

Computation and Language · Computer Science 2023-10-25 Roee Hendel , Mor Geva , Amir Globerson

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood. Some works suggest that LLMs only recall already learned concepts from…

Computation and Language · Computer Science 2023-05-18 Jane Pan , Tianyu Gao , Howard Chen , Danqi Chen

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks…

Computation and Language · Computer Science 2024-07-24 Quanyu Long , Yin Wu , Wenya Wang , Sinno Jialin Pan

What Do Language Models Learn in Context? The Structured Task Hypothesis

Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering…

Computation and Language · Computer Science 2024-08-06 Jiaoda Li , Yifan Hou , Mrinmaya Sachan , Ryan Cotterell

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this…

Machine Learning · Computer Science 2023-11-03 Steve Yadlowsky , Lyric Doshi , Nilesh Tripuraneni

What do vision-language models see in the context? Investigating multimodal in-context learning

In-context learning (ICL) enables Large Language Models (LLMs) to learn tasks from demonstration examples without parameter updates. Although it has been extensively studied in LLMs, its effectiveness in Vision-Language Models (VLMs)…

Machine Learning · Computer Science 2025-10-29 Gabriel O. dos Santos , Esther Colombini , Sandra Avila

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the…

Computation and Language · Computer Science 2024-04-11 Aaron Mueller , Albert Webson , Jackson Petty , Tal Linzen

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities, allowing them to make predictions for new tasks based on prompt exemplars without parameter updates. While existing ICL theories…

Machine Learning · Computer Science 2024-11-12 Kevin Christian Wibisono , Yixin Wang

Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks

In-context learning (ICL) enables models to adapt to new tasks via inference-time demonstrations. Despite its success in large language models, the extension of ICL to multimodal settings remains poorly understood in terms of its internal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Yu Wang , Sharon Li

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

Transformer-based large language models have displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply augmenting the query with some input-output examples from that…

Machine Learning · Computer Science 2024-06-18 Hongkang Li , Meng Wang , Songtao Lu , Xiaodong Cui , Pin-Yu Chen

Task Diversity Shortens the ICL Plateau

In-context learning (ICL) describes a language model's ability to generate outputs based on a set of input demonstrations and a subsequent query. To understand this remarkable capability, researchers have studied simplified, stylized…

Machine Learning · Computer Science 2025-08-13 Jaeyeon Kim , Sehyun Kwon , Joo Young Choi , Jongho Park , Jaewoong Cho , Jason D. Lee , Ernest K. Ryu

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored,…

Machine Learning · Computer Science 2024-09-27 Tong Yang , Yu Huang , Yingbin Liang , Yuejie Chi

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks…

Machine Learning · Computer Science 2024-10-10 Zheyang Xiong , Ziyang Cai , John Cooper , Albert Ge , Vasilis Papageorgiou , Zack Sifakis , Angeliki Giannou , Ziqian Lin , Liu Yang , Saurabh Agarwal , Grigorios G Chrysos , Samet Oymak , Kangwook Lee , Dimitris Papailiopoulos

What Makes Multimodal In-Context Learning Work?

Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Folco Bertini Baldassini , Mustafa Shukor , Matthieu Cord , Laure Soulier , Benjamin Piwowarski

Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling

Large Language Models (LLMs) exhibit In-Context Learning (ICL), which enables the model to perform new tasks conditioning only on the examples provided in the context without updating the model's weights. While ICL offers fast adaptation…

Computation and Language · Computer Science 2025-10-07 Jelena Bratulić , Sudhanshu Mittal , David T. Hoffmann , Samuel Böhm , Robin Tibor Schirrmeister , Tonio Ball , Christian Rupprecht , Thomas Brox

How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

The emergence of in-context learning (ICL) in large language models (LLMs) remains poorly understood despite its consistent effectiveness, enabling models to adapt to new tasks from only a handful of examples. To clarify and improve these…

Machine Learning · Computer Science 2025-10-02 Waïss Azizian , Ali Hasan

CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer

In-Context Learning (ICL) enhances the performance of large language models (LLMs) with demonstrations. However, obtaining these demonstrations primarily relies on manual effort. In most real-world scenarios, users are often unwilling or…

Computation and Language · Computer Science 2025-06-02 Jinglong Gao , Xiao Ding , Lingxiao Zou , Bing Qin , Ting Liu

On Many-Shot In-Context Learning for Long-Context Evaluation

Many-shot in-context learning (ICL) has emerged as a unique setup to both utilize and test the ability of large language models to handle long context. This paper delves into long-context language model (LCLM) evaluation through many-shot…

Computation and Language · Computer Science 2025-06-13 Kaijian Zou , Muhammad Khalifa , Lu Wang