English
Related papers

Related papers: Language Model Decoding as Direct Metrics Optimiza…

200 papers

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

Advances in hardware and language model architecture have spurred a revolution in natural language generation. However, autoregressive models compute probability distributions over next-token choices, and sampling from these distributions,…

Computation and Language · Computer Science 2025-09-10 Tom Kempton , Stuart Burrell

Although current state-of-the-art language models have achieved impressive results in numerous natural language processing tasks, still they could not solve the problem of producing repetitive, dull and sometimes inconsistent text in…

Computation and Language · Computer Science 2021-08-10 An Nguyen

Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions…

Computation and Language · Computer Science 2024-12-17 Esteban Garces Arias , Meimingwei Li , Christian Heumann , Matthias Aßenmacher

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the…

Computation and Language · Computer Science 2025-10-24 David Pohl , Marco Cognetta , Junyoung Lee , Naoaki Okazaki

Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling…

Computation and Language · Computer Science 2024-01-08 Clara Meister , Tiago Pimentel , Luca Malagutti , Ethan G. Wilcox , Ryan Cotterell

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…

Computation and Language · Computer Science 2025-06-06 Clara Meister , Tiago Pimentel , Gian Wiher , Ryan Cotterell

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

We propose an alternate approach to quantifying how well language models learn natural language: we ask how well they match the statistical tendencies of natural language. To answer this question, we analyze whether text generated from…

Computation and Language · Computer Science 2021-08-31 Clara Meister , Ryan Cotterell

Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Although these models can generate speech with high quality and…

Sound · Computer Science 2024-08-30 Zehai Tu , Guangyan Zhang , Yiting Lu , Adaeze Adigwe , Simon King , Yiwen Guo

Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the…

Databases · Computer Science 2023-12-27 Jasmin Mousavi , Arash Termehchy

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words,…

Computation and Language · Computer Science 2016-07-25 Kuan-Yu Chen , Shih-Hung Liu , Berlin Chen , Hsin-Min Wang , Hsin-Hsi Chen

Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms--like top-$p$ or top-$k$ -- address this by setting some words' probabilities to zero at each step. This work provides framing for the…

Computation and Language · Computer Science 2022-10-28 John Hewitt , Christopher D. Manning , Percy Liang

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of…

Computation and Language · Computer Science 2020-02-18 Ari Holtzman , Jan Buys , Li Du , Maxwell Forbes , Yejin Choi

As Large Language Models (LLMs) become increasingly integrated into our daily lives, the potential harms from deceptive behavior underlie the need for faithfully interpreting their decision-making. While traditional probing methods have…

Machine Learning · Computer Science 2024-11-08 Anthony Costarelli , Mat Allen , Severin Field

Code super-optimization is the task of transforming any given program to a more efficient version while preserving its input-output behaviour. In some sense, it is similar to the paraphrase problem from natural language processing where the…

Machine Learning · Computer Science 2017-06-29 Rudy Bunel , Alban Desmaison , M. Pawan Kumar , Philip H. S. Torr , Pushmeet Kohli

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each…

Machine Learning · Computer Science 2025-06-10 Guhao Feng , Yihan Geng , Jian Guan , Wei Wu , Liwei Wang , Di He

Comprehensive evaluation of Large Language Models (LLMs) is an open research problem. Existing evaluations rely on deterministic point estimates generated via greedy decoding. However, we find that deterministic evaluations fail to capture…

Machine Learning · Computer Science 2025-03-04 Yan Scholten , Stephan Günnemann , Leo Schwinn

Large language models (LLMs) are often equipped with multi-sample decoding strategies. An LLM implicitly defines an arithmetic code book, facilitating efficient and embarrassingly parallelizable \textbf{arithmetic sampling} to produce…

Artificial Intelligence · Computer Science 2025-04-29 Aditya Parashar , Aditya Vikram Singh , Avinash Amballa , Jinlin Lai , Benjamin Rozonoyer

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their…

Machine Learning · Computer Science 2024-10-29 Ruizhe Shi , Yifang Chen , Yushi Hu , Alisa Liu , Hannaneh Hajishirzi , Noah A. Smith , Simon S. Du
‹ Prev 1 2 3 10 Next ›