English
Related papers

Related papers: Lazy-k: Decoding for Constrained Token Classificat…

200 papers

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling…

Artificial Intelligence · Computer Science 2026-02-24 Georgy Noarov , Soham Mallick , Tao Wang , Sunay Joshi , Yan Sun , Yangxinyu Xie , Mengxin Yu , Edgar Dobriban

Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a…

Machine Learning · Computer Science 2026-02-26 Xiaotong Ji , Rasul Tutunov , Matthieu Zimmer , Haitham Bou-Ammar

Large Language Models (LLMs) have demonstrated exceptional code generation capabilities, yet their token-level mechanisms remain underexplored, particularly in compressed models. Through systematic analysis of programming language token…

Software Engineering · Computer Science 2026-02-10 Viacheslav Siniaev , Iaroslav Chelombitko , Aleksey Komissarov

Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially…

Artificial Intelligence · Computer Science 2025-06-09 Emmanuel Anaya Gonzalez , Sairam Vaidya , Kanghee Park , Ruyi Ji , Taylor Berg-Kirkpatrick , Loris D'Antoni

Detectability of failures of linear programming (LP) decoding and the potential for improvement by adding new constraints motivate the use of an adaptive approach in selecting the constraints for the underlying LP problem. In this paper, we…

Information Theory · Computer Science 2007-07-13 Mohammad H. Taghavi , Paul H. Siegel

Learning-based probabilistic models can be combined with an entropy coder for data compression. However, due to the high complexity of learning-based models, their practical application as text compressors has been largely overlooked. To…

Computation and Language · Computer Science 2024-12-25 Junxuan Zhang , Zhengxue Cheng , Yan Zhao , Shihao Wang , Dajiang Zhou , Guo Lu , Li Song

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the…

Computation and Language · Computer Science 2025-10-24 David Pohl , Marco Cognetta , Junyoung Lee , Naoaki Okazaki

Detectability of failures of linear programming (LP) decoding and its potential for improvement by adding new constraints motivate the use of an adaptive approach in selecting the constraints for the LP problem. In this paper, we make a…

Information Theory · Computer Science 2007-07-13 Mohammad H. Taghavi N. , Paul H. Siegel

How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM…

Computation and Language · Computer Science 2024-10-04 Hyunjong Ok , Jegwang Ryu , Jaeho Lee

Latent class model (LCM), which is a finite mixture of different categorical distributions, is one of the most widely used models in statistics and machine learning fields. Because of its non-continuous nature and the flexibility in shape,…

Machine Learning · Statistics 2021-03-23 Hao Chen , Lanshan Han , Alvin Lim

Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model's outputs. However, its efficacy can be limited due to the low predictive…

Artificial Intelligence · Computer Science 2024-06-11 Xiaoxuan Liu , Lanxiang Hu , Peter Bailis , Alvin Cheung , Zhijie Deng , Ion Stoica , Hao Zhang

Efficient inference in large language models (LLMs) has become a critical focus as their scale and complexity grow. Traditional autoregressive decoding, while effective, suffers from computational inefficiencies due to its sequential token…

Computation and Language · Computer Science 2024-11-28 Hyun Ryu , Eric Kim

Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the…

Computation and Language · Computer Science 2021-05-27 Tatsuya Hiraoka , Sho Takase , Kei Uchiumi , Atsushi Keyaki , Naoaki Okazaki

Decoding strategies manipulate the probability distribution underlying the output of a language model and can therefore affect both generation quality and its uncertainty. In this study, we investigate the impact of decoding strategies on…

Computation and Language · Computer Science 2025-09-23 Wataru Hashimoto , Hidetaka Kamigaito , Taro Watanabe

Several previous works concluded that the largest part of generation capabilities of large language models (LLM) are learned (early) during pre-training. However, LLMs still require further alignment to adhere to downstream task…

Computation and Language · Computer Science 2026-01-27 Ayoub Hammal , Pierre Zweigenbaum , Caio Corro

Probabilistic models help us encode latent structures that both model the data and are ideally also useful for specific downstream tasks. Among these, mixture models and their time-series counterparts, hidden Markov models, identify…

Machine Learning · Computer Science 2021-10-29 Abhishek Sharma , Catherine Zeng , Sanjana Narayanan , Sonali Parbhoo , Finale Doshi-Velez

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy. Motivated by recent work that predicts the probabilities of subsequent tokens using multiple heads, we…

Machine Learning · Computer Science 2025-02-11 Artem Basharin , Andrei Chertkov , Ivan Oseledets

The quality of text generated by large language models depends critically on the decoding sampling strategy. While mainstream methods such as Top-$k$, Top-$p$, and Min-$p$ achieve a balance between diversity and accuracy through…

Artificial Intelligence · Computer Science 2026-04-14 Yuanhao Ding , Meimingwei Li , Esteban Garces Arias , Matthias Aßenmacher , Christian Heumann , Chongsheng Zhang

To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts…

Computation and Language · Computer Science 2024-06-05 Heming Xia , Zhe Yang , Qingxiu Dong , Peiyi Wang , Yongqi Li , Tao Ge , Tianyu Liu , Wenjie Li , Zhifang Sui
‹ Prev 1 2 3 10 Next ›