Related papers: Cascade Reward Sampling for Efficient Decoding-Tim…

Constrained Adaptive Rejection Sampling

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding…

Artificial Intelligence · Computer Science 2025-10-03 Paweł Parys , Sairam Vaidya , Taylor Berg-Kirkpatrick , Loris D'Antoni

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or…

Machine Learning · Computer Science 2024-10-21 Shengjie Sun , Runze Liu , Jiafei Lyu , Jing-Wen Yang , Liangpeng Zhang , Xiu Li

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target…

Computation and Language · Computer Science 2025-06-27 Baohao Liao , Yuhui Xu , Hanze Dong , Junnan Li , Christof Monz , Silvio Savarese , Doyen Sahoo , Caiming Xiong

CARMO: Dynamic Criteria Generation for Context-Aware Reward Modelling

Reward modeling in large language models is susceptible to reward hacking, causing models to latch onto superficial features such as the tendency to generate lists or unnecessarily long responses. In reinforcement learning from human…

Computation and Language · Computer Science 2025-02-19 Taneesh Gupta , Shivam Shandilya , Xuchao Zhang , Rahul Madhavan , Supriyo Ghosh , Chetan Bansal , Huaxiu Yao , Saravan Rajmohan

Cascade-Aware Training of Language Models

Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for…

Computation and Language · Computer Science 2024-06-04 Congchao Wang , Sean Augenstein , Keith Rush , Wittawat Jitkrittum , Harikrishna Narasimhan , Ankit Singh Rawat , Aditya Krishna Menon , Alec Go

Value Augmented Sampling for Language Model Alignment and Personalization

Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant,…

Machine Learning · Computer Science 2024-05-13 Seungwook Han , Idan Shenfeld , Akash Srivastava , Yoon Kim , Pulkit Agrawal

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by…

Machine Learning · Computer Science 2026-02-02 Deyang Kong , Qi Guo , Xiangyu Xi , Wei Wang , Jingang Wang , Xunliang Cai , Shikun Zhang , Wei Ye

CAMD: Coverage-Aware Multimodal Decoding for Efficient Reasoning of Multimodal Large Language Models

Recent advances in Multimodal Large Language Models (MLLMs) have shown impressive reasoning capabilities across vision-language tasks, yet still face the challenge of compute-difficulty mismatch. Through empirical analyses, we identify that…

Machine Learning · Computer Science 2026-03-17 Huijie Guo , Jingyao Wang , Lingyu Si , Jiahuan Zhou , Changwen Zheng , Wenwen Qiang

Robust Multi-Objective Controlled Decoding of Large Language Models

We introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that robustly aligns Large Language Models (LLMs) to multiple human objectives (e.g., instruction-following, helpfulness, safety) by maximizing the…

Machine Learning · Computer Science 2026-02-17 Seongho Son , William Bankes , Sangwoong Yoon , Shyam Sundhar Ramesh , Xiaohang Tang , Ilija Bogunovic

Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models

Speculative Decoding is a prominent technique for accelerating the autoregressive inference of large language models (LLMs) by employing a fast draft model to propose candidate token sequences and a large target model to verify them in…

Computation and Language · Computer Science 2025-12-18 Chendong Sun , Ali Mao , Lei Xu , mingmin Chen

ARGS: Alignment as Reward-Guided Search

Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided…

Computation and Language · Computer Science 2024-02-06 Maxim Khanov , Jirayu Burapacheep , Yixuan Li

PAD: Personalized Alignment of LLMs at Decoding-Time

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In…

Computation and Language · Computer Science 2025-03-14 Ruizhe Chen , Xiaotian Zhang , Meng Luo , Wenhao Chai , Zuozhu Liu

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Computation and Language · Computer Science 2025-08-19 Benjamin Lipkin , Benjamin LeBrun , Jacob Hoover Vigly , João Loula , David R. MacIver , Li Du , Jason Eisner , Ryan Cotterell , Vikash Mansinghka , Timothy J. O'Donnell , Alexander K. Lew , Tim Vieira

Cascade Speculative Drafting for Even Faster LLM Inference

Introduced to enhance the efficiency of large language model (LLM) inference, speculative decoding operates by having a smaller model generate a draft. A larger target model then reviews this draft to align with its output, and any…

Machine Learning · Computer Science 2025-07-15 Ziyi Chen , Xiaocong Yang , Jiacheng Lin , Chenkai Sun , Kevin Chen-Chuan Chang , Jie Huang

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

Auto-regressive decoding in Large Language Models (LLMs) is inherently memory-bound: every generation step requires loading the model weights and intermediate results from memory (e.g., High-Bandwidth Memory (HBM) for GPU servers), making…

Machine Learning · Computer Science 2026-05-13 Yuning Han , Yangchenchen Jin , Dylan Zhao , Jingwei Sun

Learning Harmonized Representations for Speculative Sampling

Speculative sampling is a promising approach to accelerate the decoding stage for Large Language Models (LLMs). Recent advancements that leverage target LLM's contextual information, such as hidden states and KV cache, have shown…

Machine Learning · Computer Science 2025-02-27 Lefan Zhang , Xiaodan Wang , Yanhua Huang , Ruiwen Xu

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding

Speculative decoding has emerged as a promising lossless approach for accelerating Large Language Models (LLMs). As reasoning LLMs increasingly suffer from decode-stage overhead and approximation-based methods degrade accuracy, lossless…

Hardware Architecture · Computer Science 2026-05-27 Soongyu Choi , Yuntae Kim , Muyoung Son , Joo-Young Kim

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g. writing) generalizes to new tasks (e.g. math) is often not known a priori, often making using only one…

Computation and Language · Computer Science 2025-10-23 Duy Nguyen , Archiki Prasad , Elias Stengel-Eskin , Mohit Bansal

Retrieval-augmented Decoding for Improving Truthfulness in Open-ended Generation

Ensuring truthfulness in large language models (LLMs) remains a critical challenge for reliable text generation. While supervised fine-tuning and reinforcement learning with human feedback have shown promise, they require a substantial…

Machine Learning · Computer Science 2026-03-17 Manh Nguyen , Sunil Gupta , Hung Le

Adaptive Segment-level Reward: Bridging the Gap Between Action and Reward Space in Alignment

Reinforcement Learning (RL) has proven highly effective in aligning Large Language Models (LLMs) with human preferences. Typical RL methods optimize under an overall sequence reward, which can lead to a suboptimal learning process. This…

Machine Learning · Computer Science 2025-02-26 Yanshi Li , Shaopan Xiong , Gengru Chen , Xiaoyang Li , Yijia Luo , Xingyuan Bu , Yingshui Tan , Wenbo Su , Bo Zheng