English
Related papers

Related papers: MCC-KD: Multi-CoT Consistent Knowledge Distillatio…

200 papers

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation,…

Computation and Language · Computer Science 2024-10-14 Hojae Lee , Junho Kim , SangKeun Lee

Large language models (LLMs) excel in complex reasoning tasks, and distilling their reasoning capabilities into smaller models has shown promise. However, we uncover an interesting phenomenon, which we term the Small Model Learnability Gap:…

Artificial Intelligence · Computer Science 2025-11-14 Yuetai Li , Xiang Yue , Zhangchen Xu , Fengqing Jiang , Luyao Niu , Bill Yuchen Lin , Bhaskar Ramasubramanian , Radha Poovendran

Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to…

Artificial Intelligence · Computer Science 2025-11-11 Dong Chen , Shilin Zhang , Fei Gao , Yueting Zhuang , Siliang Tang , Qidong Liu , Mingliang Xu

Large Language Models (LLMs) achieve state-of-the-art performance across various NLP tasks but face deployment challenges due to high computational costs and memory constraints. Knowledge distillation (KD) is a promising solution,…

Computation and Language · Computer Science 2025-03-04 Anh Duc Le , Tu Vu , Nam Le Hai , Nguyen Thi Ngoc Diep , Linh Ngo Van , Trung Le , Thien Huu Nguyen

Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited…

Computation and Language · Computer Science 2023-08-10 Yuhan Ma , Haiqi Jiang , Chenyou Fan

While large language models (LLMs) have demonstrated exceptional performance in recent natural language processing (NLP) tasks, their deployment poses substantial challenges due to high computational and memory demands in real-world…

Computation and Language · Computer Science 2024-02-27 Chenglin Li , Qianglong Chen , Liangyue Li , Caiyu Wang , Yicheng Li , Zulong Chen , Yin Zhang

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary…

Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation (KD) to transfer reasoning capability from a…

Computation and Language · Computer Science 2025-11-10 Cong-Thanh Do , Rama Doddipatla , Kate Knill

Knowledge distillation (KD) is an effective model compression method that can transfer the internal capabilities of large language models (LLMs) to smaller ones. However, the multi-modal probability distribution predicted by teacher LLMs…

Computation and Language · Computer Science 2024-12-19 Tianyu Peng , Jiajun Zhang

Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance,…

Computation and Language · Computer Science 2023-09-01 Peifeng Wang , Zhengyang Wang , Zheng Li , Yifan Gao , Bing Yin , Xiang Ren

Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come…

Computation and Language · Computer Science 2023-10-31 Keheng Wang , Feiyu Duan , Sirui Wang , Peiguang Li , Yunsen Xian , Chuantao Yin , Wenge Rong , Zhang Xiong

Effective relevance modeling is crucial for e-commerce search, as it aligns search results with user intent and enhances customer experience. Recent work has leveraged large language models (LLMs) to address the limitations of traditional…

Information Retrieval · Computer Science 2026-01-30 Baopu Qiu , Hao Chen , Yuanrong Wu , Changtong Zan , Chao Wei , Weiru Zhang , Xiaoyi Zeng

Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve various downstream tasks. As most research mainly focuses on English, with few explorations in a…

Computation and Language · Computer Science 2024-07-11 Huiyuan Lai , Malvina Nissim

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to…

Computation and Language · Computer Science 2023-10-31 Minki Kang , Seanie Lee , Jinheon Baek , Kenji Kawaguchi , Sung Ju Hwang

Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated…

Computation and Language · Computer Science 2024-05-31 Chengwei Dai , Kun Li , Wei Zhou , Songlin Hu

Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expensive, proprietary Large Language Models (LLMs)…

Computation and Language · Computer Science 2026-03-13 Khushboo Thaker , Yony Bresler

Recent large reasoning models such as DeepSeek-R1 exhibit strong complex problems solving abilities by generating long chain-of-thought (CoT) reasoning steps. It is challenging to directly train small language models (SLMs) to emerge long…

Computation and Language · Computer Science 2025-06-19 Zhaoyang Wang , Jinqi Jiang , Tian Qiu , Hui Liu , Xianfeng Tang , Huaxiu Yao

Reasoning distillation transfers complex reasoning abilities from large language models (LLMs) to smaller ones, yet its success depends on how well the training data align with the student model. This paper introduces the Data-Model…

Artificial Intelligence · Computer Science 2026-05-29 Jiahao Huang , Fei Cheng , Junfeng Jiang , Akiko Aizawa

Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still…

Artificial Intelligence · Computer Science 2026-02-09 Ruichen Zhang , Rana Muhammad Shahroz Khan , Zhen Tan , Dawei Li , Song Wang , Tianlong Chen

Large Language Models (LLMs) excel in reasoning tasks through Chain-of-Thought (CoT) prompting. However, CoT prompting greatly increases computational demands, which has prompted growing interest in distilling CoT capabilities into Small…

Computation and Language · Computer Science 2025-05-28 Xinghao Chen , Zhijing Sun , Wenjin Guo , Miaoran Zhang , Yanjun Chen , Yirong Sun , Hui Su , Yijie Pan , Dietrich Klakow , Wenjie Li , Xiaoyu Shen
‹ Prev 1 2 3 10 Next ›