Related papers: Attention as a Hypernetwork

Analyzing the Inner Workings of Transformers in Compositional Generalization

The compositional generalization abilities of neural models have been sought after for human-like linguistic competence. The popular method to evaluate such abilities is to assess the models' input-output behavior. However, that does not…

Computation and Language · Computer Science 2025-02-24 Ryoma Kumon , Hitomi Yanaka

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

In-Context Compositional Learning via Sparse Coding Transformer

Transformer architectures have achieved remarkable success across language, vision, and multimodal tasks, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target…

Machine Learning · Computer Science 2025-11-26 Wei Chen , Jingxi Yu , Zichen Miao , Qiang Qiu

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

Compositional generalization, the ability of an agent to generalize to unseen combinations of latent factors, is easy for humans but hard for deep neural networks. A line of research in cognitive science has hypothesized a process,…

Machine Learning · Computer Science 2023-10-31 Yi Ren , Samuel Lavoie , Mikhail Galkin , Danica J. Sutherland , Aaron Courville

Memorize or generalize? Searching for a compositional RNN in a haystack

Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared…

Artificial Intelligence · Computer Science 2018-07-27 Adam Liška , Germán Kruszewski , Marco Baroni

Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks

Neural network models often generalize poorly to mismatched domains or distributions. In NLP, this issue arises in particular when models are expected to generalize compositionally, that is, to novel combinations of familiar words and…

Computation and Language · Computer Science 2021-11-10 Wang Zhu , Peter Shaw , Tal Linzen , Fei Sha

Transcoding compositionally: using attention to find more generalizable solutions

While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present…

Computation and Language · Computer Science 2019-06-07 Kris Korrel , Dieuwke Hupkes , Verna Dankers , Elia Bruni

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities,…

Machine Learning · Computer Science 2024-02-07 Rahul Ramesh , Ekdeep Singh Lubana , Mikail Khona , Robert P. Dick , Hidenori Tanaka

Automatically Composing Representation Transformations as a Means for Generalization

A generally intelligent learner should generalize to more complex tasks than it has previously encountered, but the two common paradigms in machine learning -- either training a separate learner per task or training a single learner for all…

Machine Learning · Computer Science 2019-05-09 Michael B. Chang , Abhishek Gupta , Sergey Levine , Thomas L. Griffiths

Compositional Attention: Disentangling Search and Retrieval

Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental…

Machine Learning · Computer Science 2022-02-15 Sarthak Mittal , Sharath Chandra Raparthy , Irina Rish , Yoshua Bengio , Guillaume Lajoie

On the Optimization and Generalization of Multi-head Attention

The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits…

Machine Learning · Computer Science 2024-10-15 Puneesh Deora , Rouzbeh Ghaderi , Hossein Taheri , Christos Thrampoulidis

Compositional Generalization and Decomposition in Neural Program Synthesis

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what…

Machine Learning · Computer Science 2023-10-31 Kensen Shi , Joey Hong , Manzil Zaheer , Pengcheng Yin , Charles Sutton

Scaling can lead to compositional generalization

Can neural networks systematically capture discrete, compositional task structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks suggest that the answer to this question is yes.…

Machine Learning · Computer Science 2025-10-27 Florian Redhardt , Yassir Akram , Simon Schug

Learning compositionally through attentive guidance

While neural network models have been successfully applied to domains that require substantial generalisation skills, recent studies have implied that they struggle when solving the task they are trained on requires inferring its underlying…

Computation and Language · Computer Science 2019-07-08 Dieuwke Hupkes , Anand Singh , Kris Korrel , German Kruszewski , Elia Bruni

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their…

Artificial Intelligence · Computer Science 2026-02-27 Philipp Mondorf , Shijia Zhou , Monica Riedler , Barbara Plank

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this…

Computation and Language · Computer Science 2022-11-01 Ankur Sikarwar , Arkil Patel , Navin Goyal

Compositional Generalization by Learning Analytical Expressions

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in…

Artificial Intelligence · Computer Science 2020-10-27 Qian Liu , Shengnan An , Jian-Guang Lou , Bei Chen , Zeqi Lin , Yan Gao , Bin Zhou , Nanning Zheng , Dongmei Zhang

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

When trained on language data, do transformers learn some arbitrary computation that utilizes the full capacity of the architecture or do they learn a simpler, tree-like computation, hypothesized to underlie compositional meaning systems…

Computation and Language · Computer Science 2022-11-07 Shikhar Murty , Pratyusha Sharma , Jacob Andreas , Christopher D. Manning

Development of Compositionality and Generalization through Interactive Learning of Language and Action of Robots

Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as compositionality. One of the…

Artificial Intelligence · Computer Science 2024-07-24 Prasanna Vijayaraghavan , Jeffrey Frederic Queisser , Sergio Verduzco Flores , Jun Tani

Modeling Latent Attention Within Neural Networks

Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to…

Artificial Intelligence · Computer Science 2018-01-03 Christopher Grimm , Dilip Arumugam , Siddharth Karamcheti , David Abel , Lawson L. S. Wong , Michael L. Littman