English
Related papers

Related papers: Learning Adaptive LLM Decoding

200 papers

Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agnostic, leading to suboptimal or inconsistent…

Computation and Language · Computer Science 2026-03-20 Asmita Bhardwaj , Yuya Jeremy Ong , Eelaaf Zahid , Basel Shbita

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming…

Software Engineering · Computer Science 2023-12-29 Yuqi Zhu , Jia Li , Ge Li , YunFei Zhao , Jia Li , Zhi Jin , Hong Mei

Computationally intensive decoding procedures--including search, reranking, and self-critique--can improve the quality of language model (LM) outputs in problems spanning code generation, numerical reasoning, and dialog. Existing work…

Machine Learning · Computer Science 2024-10-08 Mehul Damani , Idan Shenfeld , Andi Peng , Andreea Bobu , Jacob Andreas

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to generate candidate tokens for a larger target model to verify. The efficacy of this technique hinges on the trade-off between the time…

Computation and Language · Computer Science 2026-03-03 Jiebin Zhang , Zhenghan Yu , Liang Wang , Nan Yang , Eugene J. Yu , Zheng Li , Yifan Song , Dawei Zhu , Xingxing Zhang , Furu Wei , Sujian Li

During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction…

Computation and Language · Computer Science 2024-11-15 Shehzaad Dhuliawala , Ilia Kulikov , Ping Yu , Asli Celikyilmaz , Jason Weston , Sainbayar Sukhbaatar , Jack Lanchantin

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

Large Language Models (LLMs) have demonstrated exceptional abilities across a broad range of language-related tasks, including generating solutions to complex reasoning problems. An effective technique to enhance LLM performance is…

Computation and Language · Computer Science 2024-12-25 Shuzhang Cai , Twumasi Mensah-Boateng , Xander Kuksov , Jing Yuan , Shaojie Tang

Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft…

Computation and Language · Computer Science 2026-05-27 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

Generally, the decoder-only large language models (LLMs) are adapted to context-aware neural machine translation (NMT) in a concatenating way, where LLMs take the concatenation of the source sentence (i.e., intra-sentence context) and the…

Computation and Language · Computer Science 2024-09-24 Xinglin Lyu , Junhui Li , Yanqing Zhao , Min Zhang , Daimeng Wei , Shimin Tao , Hao Yang , Min Zhang

Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One noted issue is the high latency associated with…

Machine Learning · Computer Science 2024-12-17 Jerry Huang , Prasanna Parthasarathi , Mehdi Rezagholizadeh , Sarath Chandar

Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can…

Computation and Language · Computer Science 2025-04-16 Jinwu Hu , Wei Zhang , Yufeng Wang , Yu Hu , Bin Xiao , Mingkui Tan , Qing Du

Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high…

Computation and Language · Computer Science 2026-05-11 Xiang Liu , Xuming Hu , Xiaowen Chu , Eunsol Choi

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

In-context learning has established itself as an important learning paradigm for Large Language Models (LLMs). In this paper, we demonstrate that LLMs can learn encoding keys in-context and perform analysis directly on encoded…

Computation and Language · Computer Science 2026-04-16 Andresa Rodrigues de Campos , David Lee , Imry Kissos , Piyush Paritosh

Pretrained large language models (LLMs) are increasingly utilized across a wide range of natural language processing (NLP) tasks due to their impressive capabilities as few-shot learners. Recent techniques, such as chain-of-thought (CoT)…

Machine Learning · Computer Science 2024-12-02 Kamesh R

Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view…

Artificial Intelligence · Computer Science 2026-04-01 Chao Wu , Baoheng Li , Mingchen Gao , Yu Tian , Zhenyi Wang

To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs). However, most adapters…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Yue Zhang , Hehe Fan , Yi Yang

Large Language Models (LLMs) are increasingly applied to complex tasks that require extended reasoning. In such settings, models often benefit from diverse chains-of-thought to arrive at multiple candidate solutions. This requires two…

Machine Learning · Computer Science 2025-10-08 Xueyan Li , Guinan Su , Mrinmaya Sachan , Jonas Geiping

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when…

Artificial Intelligence · Computer Science 2026-05-15 Xi Wang , Anushri Suresh , Alvin Zhang , Rishi More , William Jurayj , Benjamin Van Durme , Mehrdad Farajtabar , Daniel Khashabi , Eric Nalisnick

Detectability of failures of linear programming (LP) decoding and its potential for improvement by adding new constraints motivate the use of an adaptive approach in selecting the constraints for the LP problem. In this paper, we make a…

Information Theory · Computer Science 2007-07-13 Mohammad H. Taghavi N. , Paul H. Siegel
‹ Prev 1 2 3 10 Next ›