Related papers: Learning Adaptive LLM Decoding

Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agnostic, leading to suboptimal or inconsistent…

Computation and Language · Computer Science 2026-03-20 Asmita Bhardwaj , Yuya Jeremy Ong , Eelaaf Zahid , Basel Shbita

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming…

Software Engineering · Computer Science 2023-12-29 Yuqi Zhu , Jia Li , Ge Li , YunFei Zhao , Jia Li , Zhi Jin , Hong Mei

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Computationally intensive decoding procedures--including search, reranking, and self-critique--can improve the quality of language model (LM) outputs in problems spanning code generation, numerical reasoning, and dialog. Existing work…

Machine Learning · Computer Science 2024-10-08 Mehul Damani , Idan Shenfeld , Andi Peng , Andreea Bobu , Jacob Andreas

Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to generate candidate tokens for a larger target model to verify. The efficacy of this technique hinges on the trade-off between the time…

Computation and Language · Computer Science 2026-03-03 Jiebin Zhang , Zhenghan Yu , Liang Wang , Nan Yang , Eugene J. Yu , Zheng Li , Yifan Song , Dawei Zhu , Xingxing Zhang , Furu Wei , Sujian Li

Adaptive Decoding via Latent Preference Optimization

During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction…

Computation and Language · Computer Science 2024-11-15 Shehzaad Dhuliawala , Ilia Kulikov , Ping Yu , Asli Celikyilmaz , Jason Weston , Sainbayar Sukhbaatar , Jack Lanchantin

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting

Large Language Models (LLMs) have demonstrated exceptional abilities across a broad range of language-related tasks, including generating solutions to complex reasoning problems. An effective technique to enhance LLM performance is…

Computation and Language · Computer Science 2024-12-25 Shuzhang Cai , Twumasi Mensah-Boateng , Xander Kuksov , Jing Yuan , Shaojie Tang

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft…

Computation and Language · Computer Science 2026-05-27 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators

Generally, the decoder-only large language models (LLMs) are adapted to context-aware neural machine translation (NMT) in a concatenating way, where LLMs take the concatenation of the source sentence (i.e., intra-sentence context) and the…

Computation and Language · Computer Science 2024-09-24 Xinglin Lyu , Junhui Li , Yanqing Zhao , Min Zhang , Daimeng Wei , Shimin Tao , Hao Yang , Min Zhang

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One noted issue is the high latency associated with…

Machine Learning · Computer Science 2024-12-17 Jerry Huang , Prasanna Parthasarathi , Mehdi Rezagholizadeh , Sarath Chandar

Dynamic Compressing Prompts for Efficient Inference of Large Language Models

Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can…

Computation and Language · Computer Science 2025-04-16 Jinwu Hu , Wei Zhang , Yufeng Wang , Yu Hu , Bin Xiao , Mingkui Tan , Qing Du

DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference

Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high…

Computation and Language · Computer Science 2026-05-11 Xiang Liu , Xuming Hu , Xiaowen Chu , Eunsol Choi

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data

In-context learning has established itself as an important learning paradigm for Large Language Models (LLMs). In this paper, we demonstrate that LLMs can learn encoding keys in-context and perform analysis directly on encoded…

Computation and Language · Computer Science 2026-04-16 Andresa Rodrigues de Campos , David Lee , Imry Kissos , Piyush Paritosh

Think Beyond Size: Adaptive Prompting for More Effective Reasoning

Pretrained large language models (LLMs) are increasingly utilized across a wide range of natural language processing (NLP) tasks due to their impressive capabilities as few-shot learners. Recent techniques, such as chain-of-thought (CoT)…

Machine Learning · Computer Science 2024-12-02 Kamesh R

From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models

Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view…

Artificial Intelligence · Computer Science 2026-04-01 Chao Wu , Baoheng Li , Mingchen Gao , Yu Tian , Zhenyi Wang

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs). However, most adapters…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Yue Zhang , Hehe Fan , Yi Yang

Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs

Large Language Models (LLMs) are increasingly applied to complex tasks that require extended reasoning. In such settings, models often benefit from diverse chains-of-thought to arrive at multiple candidate solutions. This requires two…

Machine Learning · Computer Science 2025-10-08 Xueyan Li , Guinan Su , Mrinmaya Sachan , Jonas Geiping

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when…

Artificial Intelligence · Computer Science 2026-05-15 Xi Wang , Anushri Suresh , Alvin Zhang , Rishi More , William Jurayj , Benjamin Van Durme , Mehrdad Farajtabar , Daniel Khashabi , Eric Nalisnick

Adaptive Linear Programming Decoding

Detectability of failures of linear programming (LP) decoding and its potential for improvement by adding new constraints motivate the use of an adaptive approach in selecting the constraints for the LP problem. In this paper, we make a…

Information Theory · Computer Science 2007-07-13 Mohammad H. Taghavi N. , Paul H. Siegel