Related papers: MCC-KD: Multi-CoT Consistent Knowledge Distillatio…

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation,…

Computation and Language · Computer Science 2024-10-14 Hojae Lee , Junho Kim , SangKeun Lee

Small Models Struggle to Learn from Strong Reasoners

Large language models (LLMs) excel in complex reasoning tasks, and distilling their reasoning capabilities into smaller models has shown promise. However, we uncover an interesting phenomenon, which we term the Small Model Learnability Gap:…

Artificial Intelligence · Computer Science 2025-11-14 Yuetai Li , Xiang Yue , Zhangchen Xu , Fengqing Jiang , Luyao Niu , Bill Yuchen Lin , Bhaskar Ramasubramanian , Radha Poovendran

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to…

Artificial Intelligence · Computer Science 2025-11-11 Dong Chen , Shilin Zhang , Fei Gao , Yueting Zhuang , Siliang Tang , Qidong Liu , Mingliang Xu

CoT2Align: Cross-Chain of Thought Distillation via Optimal Transport Alignment for Language Models with Different Tokenizers

Large Language Models (LLMs) achieve state-of-the-art performance across various NLP tasks but face deployment challenges due to high computational costs and memory constraints. Knowledge distillation (KD) is a promising solution,…

Computation and Language · Computer Science 2025-03-04 Anh Duc Le , Tu Vu , Nam Le Hai , Nguyen Thi Ngoc Diep , Linh Ngo Van , Trung Le , Thien Huu Nguyen

Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA

Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited…

Computation and Language · Computer Science 2023-08-10 Yuhan Ma , Haiqi Jiang , Chenyou Fan

Mixed Distillation Helps Smaller Language Model Better Reasoning

While large language models (LLMs) have demonstrated exceptional performance in recent natural language processing (NLP) tasks, their deployment poses substantial challenges due to high computational and memory demands in real-world…

Computation and Language · Computer Science 2024-02-27 Chenglin Li , Qianglong Chen , Liangyue Li , Caiyu Wang , Yicheng Li , Zulong Chen , Yin Zhang

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary…

Computation and Language · Computer Science 2026-01-06 Luyang Fang , Xiaowei Yu , Jiazhang Cai , Yongkai Chen , Shushan Wu , Zhengliang Liu , Zhenyuan Yang , Haoran Lu , Xilin Gong , Yufang Liu , Terry Ma , Wei Ruan , Ali Abbasi , Jing Zhang , Tao Wang , Ehsan Latif , Weihang You , Hanqi Jiang , Wei Liu , Wei Zhang , Soheil Kolouri , Xiaoming Zhai , Dajiang Zhu , Wenxuan Zhong , Tianming Liu , Ping Ma

Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models

Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation (KD) to transfer reasoning capability from a…

Computation and Language · Computer Science 2025-11-10 Cong-Thanh Do , Rama Doddipatla , Kate Knill

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Knowledge distillation (KD) is an effective model compression method that can transfer the internal capabilities of large language models (LLMs) to smaller ones. However, the multi-modal probability distribution predicted by teacher LLMs…

Computation and Language · Computer Science 2024-12-19 Tianyu Peng , Jiajun Zhang

SCOTT: Self-Consistent Chain-of-Thought Distillation

Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance,…

Computation and Language · Computer Science 2023-09-01 Peifeng Wang , Zhengyang Wang , Zheng Li , Yifan Gao , Bing Yin , Xiang Ren

Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come…

Computation and Language · Computer Science 2023-10-31 Keheng Wang , Feiyu Duan , Sirui Wang , Peiguang Li , Yunsen Xian , Chuantao Yin , Wenge Rong , Zhang Xiong

Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance

Effective relevance modeling is crucial for e-commerce search, as it aligns search results with user intent and enhances customer experience. Recent work has leveraged large language models (LLMs) to address the limitations of traditional…

Information Retrieval · Computer Science 2026-01-30 Baopu Qiu , Hao Chen , Yuanrong Wu , Changtong Zan , Chao Wei , Weiru Zhang , Xiaoyi Zeng

mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models

Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve various downstream tasks. As most research mainly focuses on English, with few explorations in a…

Computation and Language · Computer Science 2024-07-11 Huiyuan Lai , Malvina Nissim

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to…

Computation and Language · Computer Science 2023-10-31 Minki Kang , Seanie Lee , Jinheon Baek , Kenji Kawaguchi , Sung Ju Hwang

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated…

Computation and Language · Computer Science 2024-05-31 Chengwei Dai , Kun Li , Wei Zhou , Songlin Hu

Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expensive, proprietary Large Language Models (LLMs)…

Computation and Language · Computer Science 2026-03-13 Khushboo Thaker , Yony Bresler

Efficient Long CoT Reasoning in Small Language Models

Recent large reasoning models such as DeepSeek-R1 exhibit strong complex problems solving abilities by generating long chain-of-thought (CoT) reasoning steps. It is challenging to directly train small language models (SLMs) to emerge long…

Computation and Language · Computer Science 2025-06-19 Zhaoyang Wang , Jinqi Jiang , Tian Qiu , Hui Liu , Xianfeng Tang , Huaxiu Yao

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Reasoning distillation transfers complex reasoning abilities from large language models (LLMs) to smaller ones, yet its success depends on how well the training data align with the student model. This paper introduces the Data-Model…

Artificial Intelligence · Computer Science 2026-05-29 Jiahao Huang , Fei Cheng , Junfeng Jiang , Akiko Aizawa

The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation

Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still…

Artificial Intelligence · Computer Science 2026-02-09 Ruichen Zhang , Rana Muhammad Shahroz Khan , Zhen Tan , Dawei Li , Song Wang , Tianlong Chen

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

Large Language Models (LLMs) excel in reasoning tasks through Chain-of-Thought (CoT) prompting. However, CoT prompting greatly increases computational demands, which has prompted growing interest in distilling CoT capabilities into Small…

Computation and Language · Computer Science 2025-05-28 Xinghao Chen , Zhijing Sun , Wenjin Guo , Miaoran Zhang , Yanjun Chen , Yirong Sun , Hui Su , Yijie Pan , Dietrich Klakow , Wenjie Li , Xiaoyu Shen