Related papers: Simplified TinyBERT: Knowledge Distillation for Do…

Understanding BERT Rankers Under Distillation

Deep language models such as BERT pre-trained on large corpus have given a huge performance boost to the state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching…

Information Retrieval · Computer Science 2020-07-23 Luyu Gao , Zhuyun Dai , Jamie Callan

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

We present a novel approach for training small language models for reasoning-intensive document ranking that combines knowledge distillation with reinforcement learning optimization. While existing methods often rely on expensive human…

Information Retrieval · Computer Science 2025-07-01 Chris Samarinas , Hamed Zamani

TinyBERT: Distilling BERT for Natural Language Understanding

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently…

Computation and Language · Computer Science 2020-10-19 Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , Qun Liu

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

The use of large transformer-based models such as BERT, GPT, and T5 has led to significant advancements in natural language processing. However, these models are computationally expensive, necessitating model compression techniques that…

Computation and Language · Computer Science 2023-08-29 Apoorv Dankar , Adeem Jassani , Kartikaeya Kumar

DocBERT: BERT for Document Classification

We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content…

Computation and Language · Computer Science 2019-08-23 Ashutosh Adhikari , Achyudh Ram , Raphael Tang , Jimmy Lin

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We…

Information Retrieval · Computer Science 2020-02-18 Wenhao Lu , Jian Jiao , Ruofei Zhang

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Although BERT-based ranking models have been commonly used in commercial search engines, they are usually time-consuming for online ranking tasks. Knowledge distillation, which aims at learning a smaller model with comparable performance to…

Information Retrieval · Computer Science 2023-02-09 Xubo Qin , Xiyuan Liu , Xiongfeng Zheng , Jie Liu , Yutao Zhu

Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation

BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the…

Information Retrieval · Computer Science 2021-08-09 Jaekeol Choi , Euna Jung , Jangwon Suh , Wonjong Rhee

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model. Specifically, we distill the knowledge from ColBERT's expressive MaxSim…

Information Retrieval · Computer Science 2020-10-23 Sheng-Chieh Lin , Jheng-Hong Yang , Jimmy Lin

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to…

Computation and Language · Computer Science 2021-06-22 Lingyun Feng , Minghui Qiu , Yaliang Li , Hai-Tao Zheng , Ying Shen

Knowledge Distillation in Document Retrieval

Complex deep learning models now achieve state of the art performance for many document retrieval tasks. The best models process the query or claim jointly with the document. However for fast scalable search it is desirable to have document…

Information Retrieval · Computer Science 2019-11-26 Siamak Shakeri , Abhinav Sethy , Cheng Cheng

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly…

Computation and Language · Computer Science 2021-05-06 Jiecao Chen , Liu Yang , Karthik Raman , Michael Bendersky , Jung-Jung Yeh , Yun Zhou , Marc Najork , Danyang Cai , Ehsan Emadzadeh

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

Retrieval and ranking models are the backbone of many applications such as web search, open domain QA, or text-based recommender systems. The latency of neural ranking models at query time is largely dependent on the architecture and…

Information Retrieval · Computer Science 2021-01-25 Sebastian Hofstätter , Sophia Althammer , Michael Schröder , Mete Sertkan , Allan Hanbury

Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation

This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model,…

Computation and Language · Computer Science 2024-11-04 Ahmed Akib Jawad Karim , Kazi Hafiz Md. Asad , Md. Golam Rabiul Alam

MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

Knowledge distillation is an effective technique for pre-trained language model compression. Although existing knowledge distillation methods perform well for the most typical model BERT, they could be further improved in two aspects: the…

Computation and Language · Computer Science 2024-07-04 Ying Zhang , Ziheng Yang , Shufan Ji

Confidence Preservation Property in Knowledge Distillation Abstractions

Social media platforms prevent malicious activities by detecting harmful content of posts and comments. To that end, they employ large-scale deep neural network language models for sentiment analysis and content understanding. Some models,…

Computation and Language · Computer Science 2024-01-23 Dmitry Vengertsev , Elena Sherman

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to…

Computation and Language · Computer Science 2020-10-13 Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , Qun Liu

BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

Relevance has significant impact on user experience and business profit for e-commerce search platform. In this work, we propose a data-driven framework for search relevance prediction, by distilling knowledge from BERT and related…

Machine Learning · Computer Science 2020-10-21 Yunjiang Jiang , Yue Shang , Ziyang Liu , Hongwei Shen , Yun Xiao , Wei Xiong , Sulong Xu , Weipeng Yan , Di Jin

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of…

Computation and Language · Computer Science 2021-06-17 Xinyi Wang , Haiqin Yang , Liang Zhao , Yang Mo , Jianping Shen