English
Related papers

Related papers: Adaptivity and Modularity for Efficient Generaliza…

200 papers

Transformer-based models excel in various tasks but their generalization capabilities, especially in arithmetic reasoning, remain incompletely understood. Arithmetic tasks provide a controlled framework to explore these capabilities, yet…

Machine Learning · Computer Science 2025-08-07 Xingcheng Xu , Zibo Zhao , Haipeng Zhang , Yanqing Yang

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called…

Machine Learning · Computer Science 2023-04-18 Mengdi Xu , Yuchen Lu , Yikang Shen , Shun Zhang , Ding Zhao , Chuang Gan

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

The successful training of deep neural networks requires addressing challenges such as overfitting, numerical instabilities leading to divergence, and increasing variance in the residual stream. A common solution is to apply regularization…

Machine Learning · Computer Science 2025-11-20 Jörg K. H. Franke , Urs Spiegelhalter , Marianna Nezhurina , Jenia Jitsev , Frank Hutter , Michael Hefenbrock

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of…

Machine Learning · Computer Science 2026-04-15 Alicia Curth , Rachel Lawrence , Sushrut Karmalkar , Niranjani Prasad

Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how to measure such hardness and decide the required depths (i.e., layers) to conduct.…

Computation and Language · Computer Science 2020-12-17 Yijin Liu , Fandong Meng , Jie Zhou , Yufeng Chen , Jinan Xu

Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional…

Machine Learning · Computer Science 2025-03-12 Akhilan Boopathy , Sunshine Jiang , William Yue , Jaedong Hwang , Abhiram Iyer , Ila Fiete

Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop…

Machine Learning · Computer Science 2024-01-30 Jonas Pfeiffer , Sebastian Ruder , Ivan Vulić , Edoardo Maria Ponti

Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to…

Machine Learning · Computer Science 2022-07-07 Samuel Cognolato , Alberto Testolin

Discrete transforms, such as the discrete Fourier transform, are widely used in machine learning to improve model performance by extracting meaningful features. However, with numerous transforms available, selecting an appropriate one often…

Machine Learning · Computer Science 2025-05-09 Gekko Budiutama , Shunsuke Daimon , Hirofumi Nishi , Yu-ichiro Matsushita

We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Deblina Bhattacharjee , Sabine Süsstrunk , Mathieu Salzmann

It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to…

Machine Learning · Computer Science 2023-10-03 Pranjal Awasthi , Anupam Gupta

We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to natural language understanding and multimodal reasoning. Based on the transformer…

Computer Vision and Pattern Recognition · Computer Science 2021-08-19 Ronghang Hu , Amanpreet Singh

Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining…

While current Vision Transformer (ViT) adapter methods have shown promising accuracy, their inference speed is implicitly hindered by inefficient memory access operations, e.g., standard normalization and frequent reshaping. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Dong Zhang , Rui Yan , Pingcheng Dong , Kwang-Ting Cheng

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are…

Conventional machine learning algorithms have traditionally been designed under the assumption that input data follows a vector-based format, with an emphasis on vector-centric paradigms. However, as the demand for tasks involving set-based…

Machine Learning · Computer Science 2024-04-01 Masanari Kimura , Ryotaro Shimizu , Yuki Hirakawa , Ryosuke Goto , Yuki Saito

Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Tianxiang Hao , Hui Chen , Yuchen Guo , Guiguang Ding
‹ Prev 1 2 3 10 Next ›