Related papers: Differentiable Adaptive Computation Time for Visua…

Adaptive Computation Time for Recurrent Neural Networks

This paper introduces Adaptive Computation Time (ACT), an algorithm that allows recurrent neural networks to learn how many computational steps to take between receiving an input and emitting an output. ACT requires minimal changes to the…

Neural and Evolutionary Computing · Computer Science 2017-02-22 Alex Graves

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need…

Computation and Language · Computer Science 2021-09-27 Cristóbal Eyzaguirre , Felipe del Río , Vladimir Araujo , Álvaro Soto

Learning to Reason With Adaptive Computation

Multi-hop inference is necessary for machine learning systems to successfully solve tasks such as Recognising Textual Entailment and Machine Reading. In this work, we demonstrate the effectiveness of adaptive computation for learning the…

Computation and Language · Computer Science 2016-11-17 Mark Neumann , Pontus Stenetorp , Sebastian Riedel

Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration

Large language models (LLMs) have shown remarkable reasoning capabilities, yet aligning such abilities to small language models (SLMs) remains a challenge due to distributional mismatches and limited model capacity. Existing reasoning…

Computation and Language · Computer Science 2025-05-28 Yong Wu , Weihang Pan , Ke Li , Chen Binhui , Ping Li , Binbin Lin

Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning

Without relevant human priors, neural networks may learn uninterpretable features. We propose Dynamics of Attention for Focus Transition (DAFT) as a human prior for machine reasoning. DAFT is a novel method that regularizes attention-based…

Machine Learning · Statistics 2019-12-24 Wonjae Kim , Yoonho Lee

Compositional Attention Networks for Machine Reasoning

We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages…

Artificial Intelligence · Computer Science 2018-04-25 Drew A. Hudson , Christopher D. Manning

Dynamic Computational Time for Visual Attention

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM). Rather than attention with a fixed number of steps for each input image, the model learns to decide when to stop…

Computer Vision and Pattern Recognition · Computer Science 2017-09-08 Zhichao Li , Yi Yang , Xiao Liu , Feng Zhou , Shilei Wen , Wei Xu

Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Adaptive Computation Time for Recurrent Neural Networks (ACT) is one of the most promising architectures for variable computation. ACT adapts to the input sequence by being able to look at each sample more than once, and learn how many…

Neural and Evolutionary Computing · Computer Science 2018-03-23 Daniel Fojo , Víctor Campos , Xavier Giro-i-Nieto

DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models

Adaptive reasoning is essential for aligning the computational effort of large language models (LLMs) with the intrinsic difficulty of problems. Current chain-of-thought methods boost reasoning ability but indiscriminately generate long…

Artificial Intelligence · Computer Science 2025-12-17 Ruofan Zhang , Bin Xia , Zhen Cheng , Cairen Jian , Minglun Yang , Ngai Wong , Yuan Cheng

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Lingchen Meng , Hengduo Li , Bor-Chun Chen , Shiyi Lan , Zuxuan Wu , Yu-Gang Jiang , Ser-Nam Lim

ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More

Event cameras have recently been shown beneficial for practical vision tasks, such as action recognition, thanks to their high temporal resolution, power efficiency, and reduced privacy concerns. However, current research is hindered by 1)…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Jiazhou Zhou , Xu Zheng , Yuanhuiyi Lyu , Lin Wang

Adaptive Computation with Elastic Input Sequence

Humans have the ability to adapt the type of information they use, the procedure they employ, and the amount of time they spend when solving problems. However, most standard neural networks have a fixed function type and computation budget…

Machine Learning · Computer Science 2023-06-06 Fuzhao Xue , Valerii Likhosherstov , Anurag Arnab , Neil Houlsby , Mostafa Dehghani , Yang You

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI accelerators in resource-constrained settings. Existing methods, however, rely on…

Hardware Architecture · Computer Science 2026-03-16 Parth Patne , Mahdi Taheri , Christian Herglotz , Maksim Jenihhin , Milos Krstic , Michael Hübner

Density Adaptive Attention is All You Need: Robust Parameter-Efficient Fine-Tuning Across Multiple Modalities

We propose the Multi-Head Density Adaptive Attention Mechanism (DAAM), a novel probabilistic attention framework that can be used for Parameter-Efficient Fine-tuning (PEFT), and the Density Adaptive Transformer (DAT), designed to enhance…

Machine Learning · Computer Science 2024-10-01 Georgios Ioannides , Aman Chadha , Aaron Elkins

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in…

Machine Learning · Computer Science 2019-05-30 Parvin Nazari , Davoud Ataee Tarzanagh , George Michailidis

End-to-end Speech Recognition with Adaptive Computation Steps

In this paper, we present Adaptive Computation Steps (ACS) algo-rithm, which enables end-to-end speech recognition models to dy-namically decide how many frames should be processed to predict a linguistic output. The model that applies ACS…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-27 Mohan Li , Min Liu , Masanori Hattori

Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs

Differential computation (DC) is a highly general incremental computation/view maintenance technique that can maintain the output of an arbitrary and possibly recursive dataflow computation upon changes to its base inputs. As such, it is a…

Databases · Computer Science 2022-08-02 Khaled Ammar , Siddhartha Sahu , Semih Salihoglu , M. Tamer Ozsu

Counting with Adaptive Auxiliary Learning

This paper proposes an adaptive auxiliary task learning based approach for object counting problems. Unlike existing auxiliary task learning based methods, we develop an attention-enhanced adaptively shared backbone network to enable both…

Computer Vision and Pattern Recognition · Computer Science 2022-03-09 Yanda Meng , Joshua Bridge , Meng Wei , Yitian Zhao , Yihong Qiao , Xiaoyun Yang , Xiaowei Huang , Yalin Zheng

A New Parallel Adaptive Clustering and its Application to Streaming Data

This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes. Clustering is an important tool for data analysis and understanding in a broad set…

Machine Learning · Computer Science 2021-04-07 Benjamin McLaughlin , Sung Ha Kang

An Adaptive Method Stabilizing Activations for Enhanced Generalization

We introduce AdaAct, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which…

Machine Learning · Computer Science 2025-06-11 Hyunseok Seung , Jaewoo Lee , Hyunsuk Ko