English
Related papers

Related papers: Adaptive Decoding via Latent Preference Optimizati…

200 papers

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming…

Software Engineering · Computer Science 2023-12-29 Yuqi Zhu , Jia Li , Ge Li , YunFei Zhao , Jia Li , Zhi Jin , Hong Mei

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai

Temperature is a crucial hyperparameter in large language models (LLMs), controlling the trade-off between exploration and exploitation during text generation. High temperatures encourage diverse but noisy outputs, while low temperatures…

Machine Learning · Computer Science 2026-02-13 Haoran Dang , Cuiling Lan , Hai Wan , Xibin Zhao , Yan Lu

Diversity is an essential metric for evaluating the creativity of outputs generated by language models. Temperature-based sampling is a common strategy to increase diversity. However, for tasks that require high precision, e.g.,…

Machine Learning · Computer Science 2025-10-03 Sergey Troshin , Wafaa Mohammed , Yan Meng , Christof Monz , Antske Fokkens , Vlad Niculae

DPO (Direct Preference Optimization) has become a widely used offline preference optimization algorithm due to its simplicity and training stability. However, DPO is prone to overfitting and collapse. To address these challenges, we propose…

Machine Learning · Computer Science 2025-08-26 Rui Wang , Qianguo Sun , Chao Song , Junlong Wu , Tianrong Chen , Zhiyun Zeng , Yu Li

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Tao Zhang , Cheng Da , Kun Ding , Huan Yang , Kun Jin , Yan Li , Tingting Gao , Di Zhang , Shiming Xiang , Chunhong Pan

Large Language Models (LLMs) have demonstrated remarkable potential in automating software development tasks. While recent advances leverage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to align models with human…

Software Engineering · Computer Science 2025-12-09 Xin Yin , Chao Ni , Xiaohu Yang

Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement…

Machine Learning · Computer Science 2026-05-21 Miaobo Hu , Shuhao Hu , Bokun Wang , Ruohan Wang , Xin Wang , Xiaobo Guo , Daren Zha , Jun Xiao

Temperature scaling has been widely used as an effective approach to control the smoothness of a distribution, which helps the model performance in various tasks. Current practices to apply temperature scaling assume either a fixed, or a…

Computation and Language · Computer Science 2020-12-29 Pei-Hsin Wang , Sheng-Iou Hsieh , Shih-Chieh Chang , Yu-Ting Chen , Jia-Yu Pan , Wei Wei , Da-Chang Juan

Large language models (LLMs) have shown great potential in natural language processing tasks, but their application to machine translation (MT) remains challenging due to pretraining on English-centric data and the complexity of…

Computation and Language · Computer Science 2025-01-24 Guofeng Cui , Pichao Wang , Yang Liu , Zemian Ke , Zhu Liu , Vimal Bhat

Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are widely used in contemporary large language models (LLMs) to enhance predictive accuracy across various tasks. A key challenge in this process is…

Machine Learning · Computer Science 2025-06-17 Weihua Du , Yiming Yang , Sean Welleck

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing…

Machine Learning · Computer Science 2024-07-31 Rafael Rafailov , Archit Sharma , Eric Mitchell , Stefano Ermon , Christopher D. Manning , Chelsea Finn

Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining. In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns…

Computation and Language · Computer Science 2025-01-23 Yafu Li , Xuyang Hu , Xiaoye Qu , Linjie Li , Yu Cheng

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences…

Computation and Language · Computer Science 2024-05-29 Yueqin Yin , Zhendong Wang , Yi Gu , Hai Huang , Weizhu Chen , Mingyuan Zhou

The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel…

Computation and Language · Computer Science 2025-11-03 Zhichao Wang , Dongyang Ma , Xinting Huang , Deng Cai , Tian Lan , Jiahao Xu , Haitao Mi , Xiaoying Tang , Yan Wang

Speculative decoding stands as a pivotal technique to expedite inference in autoregressive (large) language models. This method employs a smaller draft model to speculate a block of tokens, which the target model then evaluates for…

Computation and Language · Computer Science 2024-10-15 Siru Ouyang , Shuohang Wang , Minhao Jiang , Ming Zhong , Donghan Yu , Jiawei Han , Yelong Shen

Direct Preference Optimization (DPO) has gained attention as an efficient alternative to reinforcement learning from human feedback (RLHF) for aligning large language models (LLMs) with human preferences. Despite its advantages, DPO suffers…

Computation and Language · Computer Science 2025-02-21 Ruichen Shao , Bei Li , Gangao Liu , Yang Chen , Xiang Zhou , Jingang Wang , Xunliang Cai , Peng Li

As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and…

Computation and Language · Computer Science 2025-11-25 Hyungjune Bu , Chanjoo Jung , Minjae Kang , Jaehyung Kim

Preference learning extends the performance of Code LLMs beyond traditional supervised fine-tuning by leveraging relative quality comparisons. In existing approaches, a set of n candidate solutions is evaluated based on test case success…

Computation and Language · Computer Science 2025-10-10 Jie Wu , Haoling Li , Xin Zhang , Xiao Liu , Yangyu Huang , Jianwen Luo , Yizhen Zhang , Zuchao Li , Ruihang Chu , Yujiu Yang , Scarlett Li

Post-training of language models, either through reinforcement learning, preference optimization or supervised finetuning, tends to sharpen the output probability distribution and reduce the diversity of generated responses. This is…

Computation and Language · Computer Science 2025-05-23 Jack Lanchantin , Angelica Chen , Shehzaad Dhuliawala , Ping Yu , Jason Weston , Sainbayar Sukhbaatar , Ilia Kulikov
‹ Prev 1 2 3 10 Next ›