English
Related papers

Related papers: Scalable Syntax-Aware Language Models Using Knowle…

200 papers

A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size, which prevents research groups without a necessary hardware infrastructure from participating in the development process. This study…

Computation and Language · Computer Science 2023-01-31 Jan Philip Wahle

Knowledge Distillation (KD) is increasingly adopted to transfer capabilities from large language models to smaller ones, offering significant improvements in efficiency and utility while often surpassing standard fine-tuning. Beyond…

Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an…

Computation and Language · Computer Science 2020-05-28 Adhiguna Kuncoro , Lingpeng Kong , Daniel Fried , Dani Yogatama , Laura Rimell , Chris Dyer , Phil Blunsom

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

Knowledge distillation (KD) is a key technique for compressing large language models into smaller ones while preserving performance. Despite the recent traction of KD research, its effectiveness for smaller language models (LMs) and the…

Computation and Language · Computer Science 2025-08-05 Suhas Kamasetty Ramesh , Ayan Sengupta , Tanmoy Chakraborty

Knowledge Distillation (KD) is a well-known training paradigm in deep neural networks where knowledge acquired by a large teacher model is transferred to a small student. KD has proven to be an effective technique to significantly improve…

Computer Vision and Pattern Recognition · Computer Science 2022-11-24 Philip de Rijk , Lukas Schneider , Marius Cordts , Dariu M. Gavrila

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread…

Machine Learning · Computer Science 2023-12-12 Mher Safaryan , Alexandra Peste , Dan Alistarh

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-02 Umberto Cappellazzo , Muqiao Yang , Daniele Falavigna , Alessio Brutti

Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unluckily, neural networks fail to meet these…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-24 Umberto Cappellazzo , Daniele Falavigna , Alessio Brutti

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive…

Computation and Language · Computer Science 2024-07-04 Jongwoo Ko , Sungnyun Kim , Tianyi Chen , Se-Young Yun

Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train…

Computation and Language · Computer Science 2019-12-10 Mitchell A. Gordon , Kevin Duh

Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the…

Artificial Intelligence · Computer Science 2023-10-06 Zhe Zhao , Qingyun Liu , Huan Gui , Bang An , Lichan Hong , Ed H. Chi

LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during…

Computation and Language · Computer Science 2025-04-25 Kaidong Feng , Zhu Sun , Jie Yang , Hui Fang , Xinghua Qu , Wenyuan Liu

Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expensive, proprietary Large Language Models (LLMs)…

Computation and Language · Computer Science 2026-03-13 Khushboo Thaker , Yony Bresler

Large language models (LLMs) are known to memorize parts of their training data, raising important concerns around privacy and security. While previous research has focused on studying memorization in pre-trained models, much less is known…

Machine Learning · Computer Science 2025-08-19 Simardeep Singh

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models. However, existing methods conduct KD statically, e.g., the student model aligns its output distribution to that of a selected…

Computation and Language · Computer Science 2021-09-24 Lei Li , Yankai Lin , Shuhuai Ren , Peng Li , Jie Zhou , Xu Sun

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve…

Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically…

Computation and Language · Computer Science 2023-04-20 Varun Gumma , Raj Dabre , Pratyush Kumar
‹ Prev 1 2 3 10 Next ›