English
Related papers

Related papers: Faster Depth-Adaptive Transformers

200 papers

We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of…

Machine Learning · Computer Science 2026-04-15 Alicia Curth , Rachel Lawrence , Sushrut Karmalkar , Niranjani Prasad

Adaptive inference is a promising technique to improve the computational efficiency of deep models at test time. In contrast to static models which use the same computation graph for all instances, adaptive networks can dynamically adjust…

Computer Vision and Pattern Recognition · Computer Science 2019-08-20 Hao Li , Hong Zhang , Xiaojuan Qi , Ruigang Yang , Gao Huang

Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that…

Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference.…

Computation and Language · Computer Science 2022-12-06 Kevin Clark , Kelvin Guu , Ming-Wei Chang , Panupong Pasupat , Geoffrey Hinton , Mohammad Norouzi

Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational…

Computation and Language · Computer Science 2024-09-20 Sajjad Kachuee , Mohammad Sharifkhani

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and…

Computation and Language · Computer Science 2019-02-26 Alexei Baevski , Michael Auli

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited.…

Machine Learning · Computer Science 2023-07-28 Or Sharir , Anima Anandkumar

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of…

Machine Learning · Computer Science 2021-06-21 Robert J. N. Baldock , Hartmut Maennel , Behnam Neyshabur

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase…

Computation and Language · Computer Science 2021-09-10 Tal Schuster , Adam Fisch , Tommi Jaakkola , Regina Barzilay

Based on its great successes in inference and denosing tasks, Dictionary Learning (DL) and its related sparse optimization formulations have garnered a lot of research interest. While most solutions have focused on single layer…

Machine Learning · Computer Science 2021-04-22 Wen Tang , Emilie Chouzenoux , Jean-Christophe Pesquet , Hamid Krim

The current standard approach for fine-tuning transformer-based language models includes a fixed number of training epochs and a linear learning rate schedule. In order to obtain a near-optimal model for the given downstream task, a search…

Computation and Language · Computer Science 2022-02-08 Felix Stollenwerk

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

The recent ground-breaking advances in deep learning networks ( DNNs ) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the…

Performance · Computer Science 2018-05-14 Ben Taylor , Vicent Sanz Marco , Willy Wolff , Yehia Elkhatib , Zheng Wang

In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon…

Machine Learning · Computer Science 2022-06-29 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network, which views words as nodes and performs layer-wise recurrent steps between them simultaneously. Despite its successes on text representations, the…

Computation and Language · Computer Science 2020-03-03 Yijin Liu , Fandong Meng , Yufeng Chen , Jinan Xu , Jie Zhou

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

On account of its many successes in inference tasks and denoising applications, Dictionary Learning (DL) and its related sparse optimization problems have garnered a lot of research interest. While most solutions have focused on single…

Machine Learning · Computer Science 2020-10-22 Wen Tang , Emilie Chouzenoux , Jean-Christophe Pesquet , Hamid Krim

Although deep neural networks have provided impressive gains in performance, these improvements often come at the cost of increased computational complexity and expense. In many cases, such as 3D volume or video classification tasks, not…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Sharath M Shankaranarayana , Soumava Kumar Roy , Prasad Sudhakar , Chandan Aladahalli

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or…

Machine Learning · Computer Science 2021-06-22 Nathan Dahlin , Krishna Chaitanya Kalagarla , Nikhil Naik , Rahul Jain , Pierluigi Nuzzo
‹ Prev 1 2 3 10 Next ›