Related papers: Faster Depth-Adaptive Transformers

Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of…

Machine Learning · Computer Science 2026-04-15 Alicia Curth , Rachel Lawrence , Sushrut Karmalkar , Niranjani Prasad

Improved Techniques for Training Adaptive Deep Networks

Adaptive inference is a promising technique to improve the computational efficiency of deep models at test time. In contrast to static models which use the same computation graph for all instances, adaptive networks can dynamically adjust…

Computer Vision and Pattern Recognition · Computer Science 2019-08-20 Hao Li , Hong Zhang , Xiaojuan Qi , Ruigang Yang , Gao Huang

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that…

Machine Learning · Computer Science 2023-10-16 Samira Abnar , Omid Saremi , Laurent Dinh , Shantel Wilson , Miguel Angel Bautista , Chen Huang , Vimal Thilak , Etai Littwin , Jiatao Gu , Josh Susskind , Samy Bengio

Meta-Learning Fast Weight Language Models

Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference.…

Computation and Language · Computer Science 2022-12-06 Kevin Clark , Kelvin Guu , Ming-Wei Chang , Panupong Pasupat , Geoffrey Hinton , Mohammad Norouzi

Latency Adjustable Transformer Encoder for Language Understanding

Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational…

Computation and Language · Computer Science 2024-09-20 Sajjad Kachuee , Mohammad Sharifkhani

Adaptive Input Representations for Neural Language Modeling

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and…

Computation and Language · Computer Science 2019-02-26 Alexei Baevski , Michael Auli

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited.…

Machine Learning · Computer Science 2023-07-28 Or Sharir , Anima Anandkumar

Deep Learning Through the Lens of Example Difficulty

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of…

Machine Learning · Computer Science 2021-06-21 Robert J. N. Baldock , Hartmut Maennel , Behnam Neyshabur

Consistent Accelerated Inference via Confident Adaptive Transformers

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase…

Computation and Language · Computer Science 2021-09-10 Tal Schuster , Adam Fisch , Tommi Jaakkola , Regina Barzilay

Deep Transform and Metric Learning Networks

Based on its great successes in inference and denosing tasks, Dictionary Learning (DL) and its related sparse optimization formulations have garnered a lot of research interest. While most solutions have focused on single layer…

Machine Learning · Computer Science 2021-04-22 Wen Tang , Emilie Chouzenoux , Jean-Christophe Pesquet , Hamid Krim

Adaptive Fine-Tuning of Transformer-Based Language Models for Named Entity Recognition

The current standard approach for fine-tuning transformer-based language models includes a fixed number of training epochs and a linear learning rate schedule. In order to obtain a near-optimal model for the given downstream task, a search…

Computation and Language · Computer Science 2022-02-08 Felix Stollenwerk

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

Adaptive Selection of Deep Learning Models on Embedded Systems

The recent ground-breaking advances in deep learning networks ( DNNs ) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the…

Performance · Computer Science 2018-05-14 Ben Taylor , Vicent Sanz Marco , Willy Wolff , Yehia Elkhatib , Zheng Wang

Continual Learning with Transformers for Image Classification

In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon…

Machine Learning · Computer Science 2022-06-29 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

Depth-Adaptive Graph Recurrent Network for Text Classification

The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network, which views words as nodes and performs layer-wise recurrent steps between them simultaneously. Despite its successes on text representations, the…

Computation and Language · Computer Science 2020-03-03 Yijin Liu , Fandong Meng , Yufeng Chen , Jinan Xu , Jie Zhou

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Deep Transform and Metric Learning Network: Wedding Deep Dictionary Learning and Neural Networks

On account of its many successes in inference tasks and denoising applications, Dictionary Learning (DL) and its related sparse optimization problems have garnered a lot of research interest. While most solutions have focused on single…

Machine Learning · Computer Science 2020-10-22 Wen Tang , Emilie Chouzenoux , Jean-Christophe Pesquet , Hamid Krim

Deep Attention-guided Adaptive Subsampling

Although deep neural networks have provided impressive gains in performance, these improvements often come at the cost of increased computational complexity and expense. In many cases, such as 3D volume or video classification tasks, not…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Sharath M Shankaranarayana , Soumava Kumar Roy , Prasad Sudhakar , Chandan Aladahalli

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

Designing Interpretable Approximations to Deep Reinforcement Learning

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or…

Machine Learning · Computer Science 2021-06-22 Nathan Dahlin , Krishna Chaitanya Kalagarla , Nikhil Naik , Rahul Jain , Pierluigi Nuzzo