English
Related papers

Related papers: Optimal Completion Distillation for Sequence Learn…

200 papers

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level…

Computation and Language · Computer Science 2026-05-22 Yuchen Cai , Ding Cao , Liang Lin , Chunxi Luo , Xin Xu , Kai Yang , Weijie Liu , Saiyong Yang , Tianxiang Zhao , Guangzhong Sun , Guiquan Liu , Junfeng Fang

On-policy distillation (OPD), which samples trajectories from the student model and supervises them with a teacher at the token level, avoids relying solely on verifiable terminal rewards and can yield better generalization than off-policy…

Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (OPCD), a framework that bridges on-policy distillation with context distillation…

Computation and Language · Computer Science 2026-03-24 Tianzhu Ye , Li Dong , Xun Wu , Shaohan Huang , Furu Wei

Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the…

Machine Learning · Computer Science 2024-12-16 Shitong Shao , Xu Dai , Lujun Li , Huanran Chen , Yang Hu , Shouyi Yin

Knowledge distillation offers a promising path to transfer reasoning capabilities from large teacher models to efficient student models; however, existing token-level on-policy distillation methods require token-level alignment between the…

Computation and Language · Computer Science 2026-01-30 Jing Xiong , Hui Shen , Shansan Gong , Yuxin Cheng , Jianghan Shen , Chaofan Tao , Haochen Tan , Haoli Bai , Lifeng Shang , Ngai Wong

On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt…

Computation and Language · Computer Science 2026-04-10 Feng Luo , Yu-Neng Chuang , Guanchu Wang , Zicheng Xu , Xiaotian Han , Tianyi Zhang , Vladimir Braverman

Out-of-distribution (OOD) detection remains challenging for deep learning models, particularly when test-time OOD samples differ significantly from training outliers. We propose OODD, a novel test-time OOD detection method that dynamically…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Yifeng Yang , Lin Zhu , Zewen Sun , Hengyu Liu , Qinying Gu , Nanyang Ye

Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class…

Computer Vision and Pattern Recognition · Computer Science 2022-11-15 Zelong Zeng , Fan Yang , Hong Liu , Shin'ichi Satoh

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We…

Machine Learning · Computer Science 2026-04-16 Yaxuan Li , Yuxin Zuo , Bingxiang He , Jinqian Zhang , Chaojun Xiao , Cheng Qian , Tianyu Yu , Huan-ang Gao , Wenkai Yang , Zhiyuan Liu , Ning Ding

On-policy distillation (OPD) is increasingly used in LLM post-training because it can leverage a teacher model to provide dense supervision on student rollouts. The standard implementation, however, usually reduces distribution matching to…

Machine Learning · Computer Science 2026-04-28 Yuqian Fu , Haohuan Huang , Kaiwen Jiang , Jiacai Liu , Zhuo Jiang , Yuanheng Zhu , Dongbin Zhao

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which…

Machine Learning · Computer Science 2023-06-21 Ji Won Yoon , Sunghwan Ahn , Hyeonseung Lee , Minchan Kim , Seok Min Kim , Nam Soo Kim

On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted…

Machine Learning · Computer Science 2026-05-14 Nan Jia , Haojin Yang , Xing Ma , Jiesong Lian , Shuailiang Zhang , Weipeng Zhang , Ke Zeng , Xunliang Cai , Zequn Sun

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the…

Trading and Market Microstructure · Quantitative Finance 2021-03-22 Yuchen Fang , Kan Ren , Weiqing Liu , Dong Zhou , Weinan Zhang , Jiang Bian , Yong Yu , Tie-Yan Liu

Knowledge Distillation has been established as a highly promising approach for training compact and faster models by transferring knowledge from heavyweight and powerful models. However, KD in its conventional version constitutes an…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Maria Tzelepi , Anastasios Tefas

Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Jianbin Zheng , Minghui Hu , Zhongyi Fan , Chaoyue Wang , Changxing Ding , Dacheng Tao , Tat-Jen Cham

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps…

Machine Learning · Computer Science 2026-04-21 Jiaxin Zhang , Xiangyu Peng , Qinglin Chen , Qinyuan Ye , Caiming Xiong , Chien-Sheng Wu

On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language models, offering dense token-level supervision on trajectories sampled from the model's own policy.…

Artificial Intelligence · Computer Science 2026-05-26 Siqi Zhu , Xuyan Ye , Hongyu Lu , Weiye Shi , Ge Liu

Self-supervised representation learning has proved to be a valuable component for out-of-distribution (OoD) detection with only the texts of in-distribution (ID) examples. These approaches either train a language model from scratch or…

Computation and Language · Computer Science 2023-06-05 Qianhui Wu , Huiqiang Jiang , Haonan Yin , Börje F. Karlsson , Chin-Yew Lin

Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Cunzheng Wang , Ziyuan Guo , Yuxuan Duan , Huaxia Li , Nemo Chen , Xu Tang , Yao Hu

Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical…

Machine Learning · Computer Science 2026-01-30 RuiQi Liu , Boyu Diao , Libo Huang , Zijia An , Hangda Liu , Zhulin An , Yongjun Xu
‹ Prev 1 2 3 10 Next ›