English
Related papers

Related papers: Language Model Decoding as Likelihood-Utility Alig…

200 papers

Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions…

Computation and Language · Computer Science 2024-12-17 Esteban Garces Arias , Meimingwei Li , Christian Heumann , Matthias Aßenmacher

Decoding strategies manipulate the probability distribution underlying the output of a language model and can therefore affect both generation quality and its uncertainty. In this study, we investigate the impact of decoding strategies on…

Computation and Language · Computer Science 2025-09-23 Wataru Hashimoto , Hidetaka Kamigaito , Taro Watanabe

Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive…

Computation and Language · Computer Science 2024-06-06 Haozhe Ji , Pei Ke , Hongning Wang , Minlie Huang

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity. However, there presently exists no consensus…

Computation and Language · Computer Science 2020-04-23 Hugh Zhang , Daniel Duckworth , Daphne Ippolito , Arvind Neelakantan

Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, fundamentally reshaping the landscape of natural language processing (NLP) research. However, recent evaluation frameworks often rely on the…

Computation and Language · Computer Science 2024-07-10 Chenyang Lyu , Minghao Wu , Alham Fikri Aji

Probabilistic next-token prediction trained using cross-entropy loss is the basis of most large language models. Given a sequence of previous values, next-token prediction assigns a probability to each possible next value in the vocabulary.…

Machine Learning · Statistics 2025-05-19 Jacob Trauger , Ambuj Tewari

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their…

Machine Learning · Computer Science 2024-10-29 Ruizhe Shi , Yifang Chen , Yushi Hu , Alisa Liu , Hannaneh Hajishirzi , Noah A. Smith , Simon S. Du

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

Large Language Models (LLMs) are increasingly applied to complex tasks that require extended reasoning. In such settings, models often benefit from diverse chains-of-thought to arrive at multiple candidate solutions. This requires two…

Machine Learning · Computer Science 2025-10-08 Xueyan Li , Guinan Su , Mrinmaya Sachan , Jonas Geiping

Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for…

Computation and Language · Computer Science 2026-03-09 Schrasing Tong , Eliott Zemour , Jessica Lu , Rawisara Lohanimit , Lalana Kagal

Masked diffusion language models (MDMs) uniquely support any-order generation, with confidence-based decoding currently serving as the de facto standard inference policy. To optimize for this, recent training schemes attempt to align…

Artificial Intelligence · Computer Science 2026-05-29 Dueun Kim , Albert No

Uncertainty decomposition refers to the task of decomposing the total uncertainty of a predictive model into aleatoric (data) uncertainty, resulting from inherent randomness in the data-generating process, and epistemic (model) uncertainty,…

Computation and Language · Computer Science 2024-06-12 Bairu Hou , Yujian Liu , Kaizhi Qian , Jacob Andreas , Shiyu Chang , Yang Zhang

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the…

Computation and Language · Computer Science 2025-10-24 David Pohl , Marco Cognetta , Junyoung Lee , Naoaki Okazaki

Despite widespread success in language understanding and generation, large language models (LLMs) exhibit unclear and often inconsistent behavior when faced with tasks that require probabilistic reasoning. In this work, we present the first…

Computation and Language · Computer Science 2025-09-29 Mobina Pournemat , Keivan Rezaei , Gaurang Sriramanan , Arman Zarei , Jiaxiang Fu , Yang Wang , Hamid Eghbalzadeh , Soheil Feizi

We propose a novel scaling law for general-purpose decoder-only language models (LMs) trained on multilingual data, tackling the problem of balancing languages during multilingual pretraining. A primary challenge in studying multilingual…

Computation and Language · Computer Science 2024-12-05 Yifei He , Alon Benhaim , Barun Patra , Praneetha Vaddamanu , Sanchit Ahuja , Parul Chopra , Vishrav Chaudhary , Han Zhao , Xia Song

When generating text from probabilistic models, the chosen decoding strategy has a profound effect on the resulting text. Yet the properties elicited by various decoding strategies do not always transfer across natural language generation…

Computation and Language · Computer Science 2022-03-30 Gian Wiher , Clara Meister , Ryan Cotterell

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during…

Computation and Language · Computer Science 2024-11-21 Sean Welleck , Amanda Bertsch , Matthew Finlayson , Hailey Schoelkopf , Alex Xie , Graham Neubig , Ilia Kulikov , Zaid Harchaoui

As Large Language Models (LLMs) become increasingly integrated into our daily lives, the potential harms from deceptive behavior underlie the need for faithfully interpreting their decision-making. While traditional probing methods have…

Machine Learning · Computer Science 2024-11-08 Anthony Costarelli , Mat Allen , Severin Field

LLM decoding often relies on the model's predictive distribution to generate an output. Consequently, misalignment with respect to the true generating distribution leads to suboptimal decisions in practice. While a natural solution is to…

Machine Learning · Computer Science 2026-05-12 Tim Tomov , Dominik Fuchsgruber , Rajeev Verma , Stephan Günnemann
‹ Prev 1 2 3 10 Next ›