Related papers: Entropy-informed Decoding: Adaptive Information-Dr…

Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning

Decoding strategies play a central role in shaping the reasoning ability of large language models (LLMs). Traditional methods such as greedy decoding and beam search often suffer from error propagation, while sampling-based approaches…

Computation and Language · Computer Science 2026-04-02 Jiashu He , Meizhu Liu , Olaitan P Olaleye , Amit Agarwal , M. Avendi , Yassi Abbasi , Matthew Rowe , Hitesh Laxmichand Patel , Paul Li , Tao Sheng , Sujith Ravi , Dan Roth

Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference

We present Entropy Adaptive Decoding (EAD), a novel approach for efficient language model inference that dynamically switches between different-sized models based on prediction uncertainty. By monitoring rolling entropy in model logit…

Machine Learning · Computer Science 2025-02-12 Toby Simonds

Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration

Large language models achieve strong reasoning performance, yet existing decoding strategies either explore blindly (random sampling) or redundantly (independent multi-sampling). We propose Entropy-Tree, a tree-based decoding method that…

Computation and Language · Computer Science 2026-01-23 Longxuan Wei , Yubo Zhang , Zijiao Zhang , Zhihu Wang , Shiwan Zhao , Tianyu Huang , Huiting Zhao , Chenfei Liu , Shenao Zhang , Junchi Yan

Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning

Speculative decoding (SD) accelerates large language model (LLM) reasoning by using a small draft model to generate candidate tokens, which the target LLM either accepts directly or regenerates upon rejection. However, excessive alignment…

Computation and Language · Computer Science 2026-01-01 Tiancheng Su , Meicong Zhang , Guoxiu He

Entropy-Based Decoding for Retrieval-Augmented Large Language Models

Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue,…

Computation and Language · Computer Science 2025-02-18 Zexuan Qiu , Zijing Ou , Bin Wu , Jingjing Li , Aiwei Liu , Irwin King

Entropy-Aligned Decoding of LMs for Better Writing and Reasoning

Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decoding algorithms attempt to restrict the LM…

Machine Learning · Computer Science 2026-01-06 Kareem Ahmed , Sameer Singh

Entropy-Gated Branching for Efficient Test-Time Reasoning

Test-time compute methods can significantly improve the reasoning capabilities and problem-solving accuracy of large language models (LLMs). However, these approaches require substantially more computational resources, with most compute…

Computation and Language · Computer Science 2026-01-28 Xianzhi Li , Ethan Callanan , Abdellah Ghassel , Xiaodan Zhu

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Zhongxing Xu , Zhonghua Wang , Zhe Qian , Dachuan Shi , Feilong Tang , Ming Hu , Shiyan Su , Xiaocheng Zou , Wei Feng , Dwarikanath Mahapatra , Yifan Peng , Mingquan Lin , Zongyuan Ge

Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts

Most efforts to improve the reasoning capabilities of large language models (LLMs) involve either scaling the number of parameters and the size of training data, or scaling inference computation by letting models generate complex chains of…

Machine Learning · Computer Science 2025-10-10 Yeskendir Koishekenov , Aldo Lipani , Nicola Cancedda

Entropic-Time Inference: Self-Organizing Large Language Model Decoding Beyond Attention

Modern large language model (LLM) inference engines optimize throughput and latency under fixed decoding rules, treating generation as a linear progression in token time. We propose a fundamentally different paradigm: entropic\-time…

Computation and Language · Computer Science 2026-03-05 Andrew Kiruluta

SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation

Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to…

Computation and Language · Computer Science 2024-05-28 Ziqin Luo , Haixia Han , Haokun Zhao , Guochao Jiang , Chengyu Du , Tingyun Li , Jiaqing Liang , Deqing Yang , Yanghua Xiao

Learning Adaptive LLM Decoding

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Mackenzie J. Meni , Ryan T. White , Michael Mayo , Kevin Pilkiewicz

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by…

Computation and Language · Computer Science 2026-03-20 Minsoo Cheong , Donghyun Son , Woosang Lim , Sungjoo Yoo

EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling

Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed…

Computation and Language · Computer Science 2024-04-04 Shimao Zhang , Yu Bao , Shujian Huang

Entropy-Based Block Pruning for Efficient Large Language Models

As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an…

Computation and Language · Computer Science 2025-04-08 Liangwei Yang , Yuhui Xu , Juntao Tan , Doyen Sahoo , Silvio Savarese , Caiming Xiong , Huan Wang , Shelby Heinecke

Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration

Large Language Models (LLMs) struggle with complex reasoning due to limited diversity and inefficient search. We propose Soft Reasoning, an embedding-based search framework that optimises the embedding of the first token to guide…

Computation and Language · Computer Science 2025-09-16 Qinglin Zhu , Runcong Zhao , Hanqi Yan , Yulan He , Yudong Chen , Lin Gui

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve…

Computation and Language · Computer Science 2024-04-16 Souvik Das , Lifeng Jin , Linfeng Song , Haitao Mi , Baolin Peng , Dong Yu

Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models

Despite their impressive capacities, Large language models (LLMs) often struggle with the hallucination issue of generating inaccurate or fabricated content even when they possess correct knowledge. In this paper, we extend the exploration…

Computation and Language · Computer Science 2025-02-06 Jialiang Wu , Yi Shen , Sijia Liu , Yi Tang , Sen Song , Xiaoyi Wang , Longjun Cai