Related papers: Language Model Decoding as Likelihood-Utility Alig…

Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation

Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions…

Computation and Language · Computer Science 2024-12-17 Esteban Garces Arias , Meimingwei Li , Christian Heumann , Matthias Aßenmacher

Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models

Decoding strategies manipulate the probability distribution underlying the output of a language model and can therefore affect both generation quality and its uncertainty. In this study, we investigate the impact of decoding strategies on…

Computation and Language · Computer Science 2025-09-23 Wataru Hashimoto , Hidetaka Kamigaito , Taro Watanabe

Language Model Decoding as Direct Metrics Optimization

Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive…

Computation and Language · Computer Science 2024-06-06 Haozhe Ji , Pei Ke , Hongning Wang , Minlie Huang

A Thorough Examination of Decoding Methods in the Era of LLMs

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

Trading Off Diversity and Quality in Natural Language Generation

For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity. However, there presently exists no consensus…

Computation and Language · Computer Science 2020-04-23 Hugh Zhang , Daniel Duckworth , Daphne Ippolito , Arvind Neelakantan

Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, fundamentally reshaping the landscape of natural language processing (NLP) research. However, recent evaluation frameworks often rely on the…

Computation and Language · Computer Science 2024-07-10 Chenyang Lyu , Minghao Wu , Alham Fikri Aji

On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms

Probabilistic next-token prediction trained using cross-entropy loss is the basis of most large language models. Given a sequence of previous values, next-token prediction assigns a probability to each possible next value in the vocabulary.…

Machine Learning · Statistics 2025-05-19 Jacob Trauger , Ambuj Tewari

Decoding-Time Language Model Alignment with Multiple Objectives

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their…

Machine Learning · Computer Science 2024-10-29 Ruizhe Shi , Yifang Chen , Yushi Hu , Alisa Liu , Hannaneh Hajishirzi , Noah A. Smith , Simon S. Du

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs

Large Language Models (LLMs) are increasingly applied to complex tasks that require extended reasoning. In such settings, models often benefit from diverse chains-of-thought to arrive at multiple candidate solutions. This requires two…

Machine Learning · Computer Science 2025-10-08 Xueyan Li , Guinan Su , Mrinmaya Sachan , Jonas Geiping

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for…

Computation and Language · Computer Science 2026-03-09 Schrasing Tong , Eliott Zemour , Jessica Lu , Rawisara Lohanimit , Lalana Kagal

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Masked diffusion language models (MDMs) uniquely support any-order generation, with confidence-based decoding currently serving as the de facto standard inference policy. To optimize for this, recent training schemes attempt to align…

Artificial Intelligence · Computer Science 2026-05-29 Dueun Kim , Albert No

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

Uncertainty decomposition refers to the task of decomposing the total uncertainty of a predictive model into aleatoric (data) uncertainty, resulting from inherent randomness in the data-generating process, and epistemic (model) uncertainty,…

Computation and Language · Computer Science 2024-06-12 Bairu Hou , Yujian Liu , Kaizhi Qian , Jacob Andreas , Shiyu Chang , Yang Zhang

Decoding-Free Sampling Strategies for LLM Marginalization

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the…

Computation and Language · Computer Science 2025-10-24 David Pohl , Marco Cognetta , Junyoung Lee , Naoaki Okazaki

Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs

Despite widespread success in language understanding and generation, large language models (LLMs) exhibit unclear and often inconsistent behavior when faced with tasks that require probabilistic reasoning. In this work, we present the first…

Computation and Language · Computer Science 2025-09-29 Mobina Pournemat , Keivan Rezaei , Gaurang Sriramanan , Arman Zarei , Jiaxiang Fu , Yang Wang , Hamid Eghbalzadeh , Soheil Feizi

Scaling Laws for Multilingual Language Models

We propose a novel scaling law for general-purpose decoder-only language models (LMs) trained on multilingual data, tackling the problem of balancing languages during multilingual pretraining. A primary challenge in studying multilingual…

Computation and Language · Computer Science 2024-12-05 Yifei He , Alon Benhaim , Barun Patra , Praneetha Vaddamanu , Sanchit Ahuja , Parul Chopra , Vishrav Chaudhary , Han Zhao , Xia Song

On Decoding Strategies for Neural Text Generators

When generating text from probabilistic models, the chosen decoding strategy has a profound effect on the resulting text. Yet the properties elicited by various decoding strategies do not always transfer across natural language generation…

Computation and Language · Computer Science 2022-03-30 Gian Wiher , Clara Meister , Ryan Cotterell

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during…

Computation and Language · Computer Science 2024-11-21 Sean Welleck , Amanda Bertsch , Matthew Finlayson , Hailey Schoelkopf , Alex Xie , Graham Neubig , Ilia Kulikov , Zaid Harchaoui

Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language

As Large Language Models (LLMs) become increasingly integrated into our daily lives, the potential harms from deceptive behavior underlie the need for faithfully interpreting their decision-making. While traditional probing methods have…

Machine Learning · Computer Science 2024-11-08 Anthony Costarelli , Mat Allen , Severin Field

Task-Aware Calibration: Provably Optimal Decoding in LLMs

LLM decoding often relies on the model's predictive distribution to generate an output. Consequently, misalignment with respect to the true generating distribution leads to suboptimal decisions in practice. While a natural solution is to…

Machine Learning · Computer Science 2026-05-12 Tim Tomov , Dominik Fuchsgruber , Rajeev Verma , Stephan Günnemann