English
Related papers

Related papers: Entropy-informed Decoding: Adaptive Information-Dr…

200 papers

Decoding strategies play a central role in shaping the reasoning ability of large language models (LLMs). Traditional methods such as greedy decoding and beam search often suffer from error propagation, while sampling-based approaches…

We present Entropy Adaptive Decoding (EAD), a novel approach for efficient language model inference that dynamically switches between different-sized models based on prediction uncertainty. By monitoring rolling entropy in model logit…

Machine Learning · Computer Science 2025-02-12 Toby Simonds

Large language models achieve strong reasoning performance, yet existing decoding strategies either explore blindly (random sampling) or redundantly (independent multi-sampling). We propose Entropy-Tree, a tree-based decoding method that…

Computation and Language · Computer Science 2026-01-23 Longxuan Wei , Yubo Zhang , Zijiao Zhang , Zhihu Wang , Shiwan Zhao , Tianyu Huang , Huiting Zhao , Chenfei Liu , Shenao Zhang , Junchi Yan

Speculative decoding (SD) accelerates large language model (LLM) reasoning by using a small draft model to generate candidate tokens, which the target LLM either accepts directly or regenerates upon rejection. However, excessive alignment…

Computation and Language · Computer Science 2026-01-01 Tiancheng Su , Meicong Zhang , Guoxiu He

Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue,…

Computation and Language · Computer Science 2025-02-18 Zexuan Qiu , Zijing Ou , Bin Wu , Jingjing Li , Aiwei Liu , Irwin King

Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decoding algorithms attempt to restrict the LM…

Machine Learning · Computer Science 2026-01-06 Kareem Ahmed , Sameer Singh

Test-time compute methods can significantly improve the reasoning capabilities and problem-solving accuracy of large language models (LLMs). However, these approaches require substantially more computational resources, with most compute…

Computation and Language · Computer Science 2026-01-28 Xianzhi Li , Ethan Callanan , Abdellah Ghassel , Xiaodan Zhu

Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Zhongxing Xu , Zhonghua Wang , Zhe Qian , Dachuan Shi , Feilong Tang , Ming Hu , Shiyan Su , Xiaocheng Zou , Wei Feng , Dwarikanath Mahapatra , Yifan Peng , Mingquan Lin , Zongyuan Ge

Most efforts to improve the reasoning capabilities of large language models (LLMs) involve either scaling the number of parameters and the size of training data, or scaling inference computation by letting models generate complex chains of…

Machine Learning · Computer Science 2025-10-10 Yeskendir Koishekenov , Aldo Lipani , Nicola Cancedda

Modern large language model (LLM) inference engines optimize throughput and latency under fixed decoding rules, treating generation as a linear progression in token time. We propose a fundamentally different paradigm: entropic\-time…

Computation and Language · Computer Science 2026-03-05 Andrew Kiruluta

Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to…

Computation and Language · Computer Science 2024-05-28 Ziqin Luo , Haixia Han , Haokun Zhao , Guochao Jiang , Chengyu Du , Tingyun Li , Jiaqing Liang , Deqing Yang , Yanghua Xiao

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Mackenzie J. Meni , Ryan T. White , Michael Mayo , Kevin Pilkiewicz

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by…

Computation and Language · Computer Science 2026-03-20 Minsoo Cheong , Donghyun Son , Woosang Lim , Sungjoo Yoo

Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed…

Computation and Language · Computer Science 2024-04-04 Shimao Zhang , Yu Bao , Shujian Huang

As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an…

Computation and Language · Computer Science 2025-04-08 Liangwei Yang , Yuhui Xu , Juntao Tan , Doyen Sahoo , Silvio Savarese , Caiming Xiong , Huan Wang , Shelby Heinecke

Large Language Models (LLMs) struggle with complex reasoning due to limited diversity and inefficient search. We propose Soft Reasoning, an embedding-based search framework that optimises the embedding of the first token to guide…

Computation and Language · Computer Science 2025-09-16 Qinglin Zhu , Runcong Zhao , Hanqi Yan , Yulan He , Yudong Chen , Lin Gui

Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve…

Computation and Language · Computer Science 2024-04-16 Souvik Das , Lifeng Jin , Linfeng Song , Haitao Mi , Baolin Peng , Dong Yu

Despite their impressive capacities, Large language models (LLMs) often struggle with the hallucination issue of generating inaccurate or fabricated content even when they possess correct knowledge. In this paper, we extend the exploration…

Computation and Language · Computer Science 2025-02-06 Jialiang Wu , Yi Shen , Sijia Liu , Yi Tang , Sen Song , Xiaoyi Wang , Longjun Cai
‹ Prev 1 2 3 10 Next ›