Related papers: Logic Distillation: Learning from Code Function by…

Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation

Large Language Models (LLMs) demonstrate exceptional reasoning capabilities, often achieving state-of-the-art performance in various tasks. However, their substantial computational and memory demands, due to billions of parameters, hinder…

Computation and Language · Computer Science 2024-11-25 Xunyu Zhu , Jian Li , Can Ma , Weiping Wang

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation,…

Computation and Language · Computer Science 2024-10-14 Hojae Lee , Junho Kim , SangKeun Lee

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to…

Computation and Language · Computer Science 2023-10-31 Minki Kang , Seanie Lee , Jinheon Baek , Kenji Kawaguchi , Sung Ju Hwang

MCC-KD: Multi-CoT Consistent Knowledge Distillation

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller…

Computation and Language · Computer Science 2023-12-21 Hongzhan Chen , Siyue Wu , Xiaojun Quan , Rui Wang , Ming Yan , Ji Zhang

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve…

Computation and Language · Computer Science 2024-07-24 Jiaheng Liu , Chenchen Zhang , Jinyang Guo , Yuanxing Zhang , Haoran Que , Ken Deng , Zhiqi Bai , Jie Liu , Ge Zhang , Jiakai Wang , Yanan Wu , Congnan Liu , Wenbo Su , Jiamang Wang , Lin Qu , Bo Zheng

Effective Distillation of Table-based Reasoning Ability from LLMs

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their enormous parameter size and extremely high requirements for compute power pose challenges for…

Computation and Language · Computer Science 2024-03-26 Bohao Yang , Chen Tang , Kun Zhao , Chenghao Xiao , Chenghua Lin

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands…

Computation and Language · Computer Science 2024-07-03 Chuanpeng Yang , Wang Lu , Yao Zhu , Yidong Wang , Qian Chen , Chenlong Gao , Bingjie Yan , Yiqiang Chen

MiniLLM: On-Policy Distillation of Large Language Models

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models…

Computation and Language · Computer Science 2026-02-03 Yuxian Gu , Li Dong , Furu Wei , Minlie Huang

Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?

LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during…

Computation and Language · Computer Science 2025-04-25 Kaidong Feng , Zhu Sun , Jie Yang , Hui Fang , Xinghua Qu , Wenyuan Liu

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary…

Computation and Language · Computer Science 2026-01-06 Luyang Fang , Xiaowei Yu , Jiazhang Cai , Yongkai Chen , Shushan Wu , Zhengliang Liu , Zhenyuan Yang , Haoran Lu , Xilin Gong , Yufang Liu , Terry Ma , Wei Ruan , Ali Abbasi , Jing Zhang , Tao Wang , Ehsan Latif , Weihang You , Hanqi Jiang , Wei Liu , Wei Zhang , Soheil Kolouri , Xiaoming Zhai , Dajiang Zhu , Wenxuan Zhong , Tianming Liu , Ping Ma

A Survey on Knowledge Distillation of Large Language Models

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and…

Computation and Language · Computer Science 2024-10-22 Xiaohan Xu , Ming Li , Chongyang Tao , Tao Shen , Reynold Cheng , Jinyang Li , Can Xu , Dacheng Tao , Tianyi Zhou

Active Large Language Model-based Knowledge Distillation for Session-based Recommendation

Large language models (LLMs) provide a promising way for accurate session-based recommendation (SBR), but they demand substantial computational time and memory. Knowledge distillation (KD)-based methods can alleviate these issues by…

Information Retrieval · Computer Science 2025-02-25 Yingpeng Du , Zhu Sun , Ziyan Wang , Haoyan Chua , Jie Zhang , Yew-Soon Ong

The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models

Distilling the thinking traces of a Large Language Model (LLM) with reasoning capabilities into a smaller model has been proven effective. Yet, there is a scarcity of work done on how model performances scale with the quantity of…

Computation and Language · Computer Science 2025-10-08 Muyu He , Muhammad Ali Shafique , Anand Kumar , Tsach Mackey , Nazneen Rajani

Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models

Large language models (LLMs) have achieved remarkable advancements in natural language processing. However, the massive scale and computational demands of these models present formidable challenges when considering their practical…

Computation and Language · Computer Science 2024-04-09 Weize Liu , Guocong Li , Kai Zhang , Bang Du , Qiyuan Chen , Xuming Hu , Hongxia Xu , Jintai Chen , Jian Wu

Hybrid Policy Distillation for LLMs

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of…

Computation and Language · Computer Science 2026-04-23 Wenhong Zhu , Ruobing Xie , Rui Wang , Pengfei Liu

SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning

Large Language Models (LLMs) can transfer their reasoning skills to smaller models by teaching them to generate the intermediate reasoning process required to solve multistep reasoning tasks. While LLMs can accurately solve reasoning tasks…

Artificial Intelligence · Computer Science 2024-10-25 Shivam Adarsh , Kumar Shridhar , Caglar Gulcehre , Nicholas Monath , Mrinmaya Sachan

Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expensive, proprietary Large Language Models (LLMs)…

Computation and Language · Computer Science 2026-03-13 Khushboo Thaker , Yony Bresler

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language…

Computation and Language · Computer Science 2025-11-06 Minki Kang , Jongwon Jeong , Seanie Lee , Jaewoong Cho , Sung Ju Hwang

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

Pre-trained language models (PLMs) have emerged as powerful tools for code understanding. However, deploying these PLMs in large-scale applications faces practical challenges due to their computational intensity and inference latency.…

Software Engineering · Computer Science 2025-08-22 Ruiqi Wang , Zezhou Yang , Cuiyun Gao , Xin Xia , Qing Liao

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate…

Software Engineering · Computer Science 2024-03-21 Zhihong Sun , Chen Lyu , Bolun Li , Yao Wan , Hongyu Zhang , Ge Li , Zhi Jin