Related papers: Adaptivity and Modularity for Efficient Generaliza…

Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks

Transformer-based models excel in various tasks but their generalization capabilities, especially in arithmetic reasoning, remain incompletely understood. Arithmetic tasks provide a controlled framework to explore these capabilities, yet…

Machine Learning · Computer Science 2025-08-07 Xingcheng Xu , Zibo Zhao , Haipeng Zhang , Yanqing Yang

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called…

Machine Learning · Computer Science 2023-04-18 Mengdi Xu , Yuchen Lu , Yikang Shen , Shun Zhang , Ding Zhao , Chuang Gan

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

Learning in Compact Spaces with Approximately Normalized Transformer

The successful training of deep neural networks requires addressing challenges such as overfitting, numerical instabilities leading to divergence, and increasing variance in the residual stream. A common solution is to apply regularization…

Machine Learning · Computer Science 2025-11-20 Jörg K. H. Franke , Urs Spiegelhalter , Marianna Nezhurina , Jenia Jitsev , Frank Hutter , Michael Hefenbrock

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of…

Machine Learning · Computer Science 2026-04-15 Alicia Curth , Rachel Lawrence , Sushrut Karmalkar , Niranjani Prasad

Faster Depth-Adaptive Transformers

Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how to measure such hardness and decide the required depths (i.e., layers) to conduct.…

Computation and Language · Computer Science 2020-12-17 Yijin Liu , Fandong Meng , Jie Zhou , Yufeng Chen , Jinan Xu

Breaking Neural Network Scaling Laws with Modularity

Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional…

Machine Learning · Computer Science 2025-03-12 Akhilan Boopathy , Sunshine Jiang , William Yue , Jaedong Hwang , Abhiram Iyer , Ila Fiete

Modular Deep Learning

Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop…

Machine Learning · Computer Science 2024-01-30 Jonas Pfeiffer , Sebastian Ruder , Ivan Vulić , Edoardo Maria Ponti

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to…

Machine Learning · Computer Science 2022-07-07 Samuel Cognolato , Alberto Testolin

General Transform: A Unified Framework for Adaptive Transform to Enhance Representations

Discrete transforms, such as the discrete Fourier transform, are widely used in machine learning to improve model performance by extracting meaningful features. However, with numerous transforms available, selecting an appropriate one often…

Machine Learning · Computer Science 2025-05-09 Gekko Budiutama , Shunsuke Daimon , Hirofumi Nishi , Yu-ichiro Matsushita

Vision Transformer Adapters for Generalizable Multitask Learning

We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Deblina Bhattacharjee , Sabine Süsstrunk , Mathieu Salzmann

Improving Length-Generalization in Transformers via Task Hinting

It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to…

Machine Learning · Computer Science 2023-10-03 Pranjal Awasthi , Anupam Gupta

UniT: Multimodal Multitask Learning with a Unified Transformer

We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to natural language understanding and multimodal reasoning. Based on the transformer…

Computer Vision and Pattern Recognition · Computer Science 2021-08-19 Ronghang Hu , Amanpreet Singh

TOAST: Transformer Optimization using Adaptive and Simple Transformations

Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining…

Machine Learning · Computer Science 2026-05-19 Irene Cannistraci , Simone Antonelli , Emanuele Palumbo , Thomas M. Sutter , Emanuele Rodolà , Bastian Rieck , Julia E. Vogt

Memory Efficient Transformer Adapter for Dense Predictions

While current Vision Transformer (ViT) adapter methods have shown promising accuracy, their inference speed is implicitly hindered by inefficient memory access operations, e.g., standard normalization and frequent reshaping. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Dong Zhang , Rui Yan , Pingcheng Dong , Kwang-Ting Cheng

Dissecting Multiplication in Transformers: Insights into LLMs

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

Meta-Learning Transformers to Improve In-Context Generalization

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are…

Machine Learning · Computer Science 2025-07-08 Lorenzo Braccaioli , Anna Vettoruzzo , Prabhant Singh , Joaquin Vanschoren , Mohamed-Rafik Bouguelia , Nicola Conci

On permutation-invariant neural networks

Conventional machine learning algorithms have traditionally been designed under the assumption that input data follows a vector-based format, with an emphasis on vector-centric paradigms. However, as the demand for tasks involving set-based…

Machine Learning · Computer Science 2024-04-01 Masanari Kimura , Ryotaro Shimizu , Yuki Hirakawa , Ryosuke Goto , Yuki Saito

Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Tianxiang Hao , Hui Chen , Yuchen Guo , Guiguang Ding