Related papers: Optimal Completion Distillation for Sequence Learn…

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level…

Computation and Language · Computer Science 2026-05-22 Yuchen Cai , Ding Cao , Liang Lin , Chunxi Luo , Xin Xu , Kai Yang , Weijie Liu , Saiyong Yang , Tianxiang Zhao , Guangzhong Sun , Guiquan Liu , Junfeng Fang

Fast and Effective On-policy Distillation from Reasoning Prefixes

On-policy distillation (OPD), which samples trajectories from the student model and supervises them with a teacher at the token level, avoids relying solely on verifiable terminal rewards and can yield better generalization than off-policy…

Machine Learning · Computer Science 2026-02-18 Dongxu Zhang , Zhichao Yang , Sepehr Janghorbani , Jun Han , Andrew Ressler , Qian Qian , Gregory D. Lyng , Sanjit Singh Batra , Robert E. Tillman

On-Policy Context Distillation for Language Models

Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (OPCD), a framework that bridges on-policy distillation with context distillation…

Computation and Language · Computer Science 2026-03-24 Tianzhu Ye , Li Dong , Xun Wu , Shaohan Huang , Furu Wei

Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the…

Machine Learning · Computer Science 2024-12-16 Shitong Shao , Xu Dai , Lujun Li , Huanran Chen , Yang Hu , Shouyi Yin

OVD: On-policy Verbal Distillation

Knowledge distillation offers a promising path to transfer reasoning capabilities from large teacher models to efficient student models; however, existing token-level on-policy distillation methods require token-level alignment between the…

Computation and Language · Computer Science 2026-01-30 Jing Xiong , Hui Shen , Shansan Gong , Yuxin Cheng , Jianghan Shen , Chaofan Tao , Haochen Tan , Haoli Bai , Lifeng Shang , Ngai Wong

Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models

On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt…

Computation and Language · Computer Science 2026-04-10 Feng Luo , Yu-Neng Chuang , Guanchu Wang , Zicheng Xu , Xiaotian Han , Tianyi Zhang , Vladimir Braverman

OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary

Out-of-distribution (OOD) detection remains challenging for deep learning models, particularly when test-time OOD samples differ significantly from training outliers. We propose OODD, a novel test-time OOD detection method that dynamically…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Yifeng Yang , Lin Zhu , Zewen Sun , Hengyu Liu , Qinying Gu , Nanyang Ye

Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class…

Computer Vision and Pattern Recognition · Computer Science 2022-11-15 Zelong Zeng , Fan Yang , Hong Liu , Shin'ichi Satoh

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We…

Machine Learning · Computer Science 2026-04-16 Yaxuan Li , Yuxin Zuo , Bingxiang He , Jinqian Zhang , Chaojun Xiao , Cheng Qian , Tianyu Yu , Huan-ang Gao , Wenkai Yang , Zhiyuan Liu , Ning Ding

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

On-policy distillation (OPD) is increasingly used in LLM post-training because it can leverage a teacher model to provide dense supervision on student rollouts. The standard implementation, however, usually reduces distribution matching to…

Machine Learning · Computer Science 2026-04-28 Yuqian Fu , Haohuan Huang , Kaiwen Jiang , Jiacai Liu , Zhuo Jiang , Yuanheng Zhu , Dongbin Zhao

EM-Network: Oracle Guided Self-distillation for Sequence Learning

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which…

Machine Learning · Computer Science 2023-06-21 Ji Won Yoon , Sunghwan Ahn , Hyeonseung Lee , Minchan Kim , Seok Min Kim , Nam Soo Kim

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted…

Machine Learning · Computer Science 2026-05-14 Nan Jia , Haojin Yang , Xing Ma , Jiesong Lian , Shuailiang Zhang , Weipeng Zhang , Ke Zeng , Xunliang Cai , Zequn Sun

Universal Trading for Order Execution with Oracle Policy Distillation

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the…

Trading and Market Microstructure · Quantitative Finance 2021-03-22 Yuchen Fang , Kan Ren , Weiqing Liu , Dong Zhou , Weinan Zhang , Jiang Bian , Yong Yu , Tie-Yan Liu

Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

Knowledge Distillation has been established as a highly promising approach for training compact and faster models by transferring knowledge from heavyweight and powerful models. However, KD in its conventional version constitutes an…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Maria Tzelepi , Anastasios Tefas

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Jianbin Zheng , Minghui Hu , Zhongyi Fan , Chaoyue Wang , Changxing Ding , Dacheng Tao , Tat-Jen Cham

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps…

Machine Learning · Computer Science 2026-04-21 Jiaxin Zhang , Xiangyu Peng , Qinglin Chen , Qinyuan Ye , Caiming Xiong , Chien-Sheng Wu

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language models, offering dense token-level supervision on trajectories sampled from the model's own policy.…

Artificial Intelligence · Computer Science 2026-05-26 Siqi Zhu , Xuyan Ye , Hongyu Lu , Weiye Shi , Ge Liu

Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text

Self-supervised representation learning has proved to be a valuable component for out-of-distribution (OoD) detection with only the texts of in-distribution (ID) examples. These approaches either train a language model from scratch or…

Computation and Language · Computer Science 2023-06-05 Qianhui Wu , Huiqiang Jiang , Haonan Yin , Börje F. Karlsson , Chin-Yew Lin

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Cunzheng Wang , Ziyuan Guo , Yuxuan Duan , Huaxia Li , Nemo Chen , Xu Tang , Yao Hu

Low-redundancy Distillation for Continual Learning

Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical…

Machine Learning · Computer Science 2026-01-30 RuiQi Liu , Boyu Diao , Libo Huang , Zijia An , Hangda Liu , Zhulin An , Yongjun Xu