English
Related papers

Related papers: Cascade Reward Sampling for Efficient Decoding-Tim…

200 papers

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding…

Artificial Intelligence · Computer Science 2025-10-03 Paweł Parys , Sairam Vaidya , Taylor Berg-Kirkpatrick , Loris D'Antoni

Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or…

Machine Learning · Computer Science 2024-10-21 Shengjie Sun , Runze Liu , Jiafei Lyu , Jing-Wen Yang , Liangpeng Zhang , Xiu Li

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target…

Computation and Language · Computer Science 2025-06-27 Baohao Liao , Yuhui Xu , Hanze Dong , Junnan Li , Christof Monz , Silvio Savarese , Doyen Sahoo , Caiming Xiong

Reward modeling in large language models is susceptible to reward hacking, causing models to latch onto superficial features such as the tendency to generate lists or unnecessarily long responses. In reinforcement learning from human…

Computation and Language · Computer Science 2025-02-19 Taneesh Gupta , Shivam Shandilya , Xuchao Zhang , Rahul Madhavan , Supriyo Ghosh , Chetan Bansal , Huaxiu Yao , Saravan Rajmohan

Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for…

Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant,…

Machine Learning · Computer Science 2024-05-13 Seungwook Han , Idan Shenfeld , Akash Srivastava , Yoon Kim , Pulkit Agrawal

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by…

Machine Learning · Computer Science 2026-02-02 Deyang Kong , Qi Guo , Xiangyu Xi , Wei Wang , Jingang Wang , Xunliang Cai , Shikun Zhang , Wei Ye

Recent advances in Multimodal Large Language Models (MLLMs) have shown impressive reasoning capabilities across vision-language tasks, yet still face the challenge of compute-difficulty mismatch. Through empirical analyses, we identify that…

Machine Learning · Computer Science 2026-03-17 Huijie Guo , Jingyao Wang , Lingyu Si , Jiahuan Zhou , Changwen Zheng , Wenwen Qiang

We introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that robustly aligns Large Language Models (LLMs) to multiple human objectives (e.g., instruction-following, helpfulness, safety) by maximizing the…

Machine Learning · Computer Science 2026-02-17 Seongho Son , William Bankes , Sangwoong Yoon , Shyam Sundhar Ramesh , Xiaohang Tang , Ilija Bogunovic

Speculative Decoding is a prominent technique for accelerating the autoregressive inference of large language models (LLMs) by employing a fast draft model to propose candidate token sequences and a large target model to verify them in…

Computation and Language · Computer Science 2025-12-18 Chendong Sun , Ali Mao , Lei Xu , mingmin Chen

Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided…

Computation and Language · Computer Science 2024-02-06 Maxim Khanov , Jirayu Burapacheep , Yixuan Li

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In…

Computation and Language · Computer Science 2025-03-14 Ruizhe Chen , Xiaotian Zhang , Meng Luo , Wenhao Chai , Zuozhu Liu

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Introduced to enhance the efficiency of large language model (LLM) inference, speculative decoding operates by having a smaller model generate a draft. A larger target model then reviews this draft to align with its output, and any…

Machine Learning · Computer Science 2025-07-15 Ziyi Chen , Xiaocong Yang , Jiacheng Lin , Chenkai Sun , Kevin Chen-Chuan Chang , Jie Huang

Auto-regressive decoding in Large Language Models (LLMs) is inherently memory-bound: every generation step requires loading the model weights and intermediate results from memory (e.g., High-Bandwidth Memory (HBM) for GPU servers), making…

Machine Learning · Computer Science 2026-05-13 Yuning Han , Yangchenchen Jin , Dylan Zhao , Jingwei Sun

Speculative sampling is a promising approach to accelerate the decoding stage for Large Language Models (LLMs). Recent advancements that leverage target LLM's contextual information, such as hidden states and KV cache, have shown…

Machine Learning · Computer Science 2025-02-27 Lefan Zhang , Xiaodan Wang , Yanhua Huang , Ruiwen Xu

Speculative decoding has emerged as a promising lossless approach for accelerating Large Language Models (LLMs). As reasoning LLMs increasingly suffer from decode-stage overhead and approximation-based methods degrade accuracy, lossless…

Hardware Architecture · Computer Science 2026-05-27 Soongyu Choi , Yuntae Kim , Muyoung Son , Joo-Young Kim

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g. writing) generalizes to new tasks (e.g. math) is often not known a priori, often making using only one…

Computation and Language · Computer Science 2025-10-23 Duy Nguyen , Archiki Prasad , Elias Stengel-Eskin , Mohit Bansal

Ensuring truthfulness in large language models (LLMs) remains a critical challenge for reliable text generation. While supervised fine-tuning and reinforcement learning with human feedback have shown promise, they require a substantial…

Machine Learning · Computer Science 2026-03-17 Manh Nguyen , Sunil Gupta , Hung Le

Reinforcement Learning (RL) has proven highly effective in aligning Large Language Models (LLMs) with human preferences. Typical RL methods optimize under an overall sequence reward, which can lead to a suboptimal learning process. This…

Machine Learning · Computer Science 2025-02-26 Yanshi Li , Shaopan Xiong , Gengru Chen , Xiaoyang Li , Yijia Luo , Xingyuan Bu , Yingshui Tan , Wenbo Su , Bo Zheng
‹ Prev 1 2 3 10 Next ›