Related papers: TopicBERT for Energy Efficient Document Classifica…

ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular…

Computation and Language · Computer Science 2022-12-01 Rui Pan , Shizhe Diao , Jianlin Chen , Tong Zhang

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

DocBERT: BERT for Document Classification

We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content…

Computation and Language · Computer Science 2019-08-23 Ashutosh Adhikari , Achyudh Ram , Raphael Tang , Jimmy Lin

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts…

Computation and Language · Computer Science 2024-06-04 Jungmin Yun , Mihyeon Kim , Youngbin Kim

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer…

Computation and Language · Computer Science 2021-11-19 Shira Guskin , Moshe Wasserblat , Ke Ding , Gyuwan Kim

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge…

Information Retrieval · Computer Science 2023-05-05 Xuanang Chen , Ben He , Kai Hui , Le Sun , Yingfei Sun

Efficient Fine-Tuning of Compressed Language Models with Learners

Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the…

Computation and Language · Computer Science 2022-08-04 Danilo Vucetic , Mohammadreza Tayaranian , Maryam Ziaeefard , James J. Clark , Brett H. Meyer , Warren J. Gross

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However,…

Computation and Language · Computer Science 2023-09-01 Souvik Kundu , Sharath Nittur Sridhar , Maciej Szankin , Sairam Sundaresan

Blockwise Self-Attention for Long Document Understanding

We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and…

Computation and Language · Computer Science 2020-11-03 Jiezhong Qiu , Hao Ma , Omer Levy , Scott Wen-tau Yih , Sinong Wang , Jie Tang

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally expensive in inference, making them impractical in various resource-limited real-world applications. To address this issue, we propose a dynamic token reduction approach…

Computation and Language · Computer Science 2021-05-26 Deming Ye , Yankai Lin , Yufei Huang , Maosong Sun

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both…

Computation and Language · Computer Science 2021-06-09 Xiaohan Chen , Yu Cheng , Shuohang Wang , Zhe Gan , Zhangyang Wang , Jingjing Liu

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the…

Information Retrieval · Computer Science 2021-05-21 Sebastian Hofstätter , Bhaskar Mitra , Hamed Zamani , Nick Craswell , Allan Hanbury

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting…

Computation and Language · Computer Science 2023-11-01 Aman Jaiswal , Evangelos Milios

Profitable Trade-Off Between Memory and Performance In Multi-Domain Chatbot Architectures

Text classification problem is a very broad field of study in the field of natural language processing. In short, the text classification problem is to determine which of the previously determined classes the given text belongs to.…

Computation and Language · Computer Science 2021-12-28 D. Emre Taşar , Şükrü Ozan , M. Fatih Akca , Oğuzhan Ölmez , Semih Gülüm , Seçilay Kutal , Ceren Belhan

AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a…

Computation and Language · Computer Science 2022-03-18 Ali Modarressi , Hosein Mohebbi , Mohammad Taher Pilehvar

Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification

This work evaluates Sentence-BERT for a multi-label code comment classification task seeking to maximize the classification performance while controlling efficiency constraints during inference. Using a dataset of 13,216 labeled comment…

Software Engineering · Computer Science 2025-06-16 Fabian C. Peña , Steffen Herbold

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We…

Information Retrieval · Computer Science 2020-02-18 Wenhao Lu , Jian Jiao , Ruofei Zhang

Cluster & Tune: Boost Cold Start Performance in Text Classification

In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone…

Computation and Language · Computer Science 2022-03-22 Eyal Shnarch , Ariel Gera , Alon Halfon , Lena Dankin , Leshem Choshen , Ranit Aharonov , Noam Slonim

Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code

Pre-trained code representation models such as CodeBERT have demonstrated superior performance in a variety of software engineering tasks, yet they are often heavy in complexity, quadratically with the length of the input sequence. Our…

Software Engineering · Computer Science 2022-11-22 Zhaowei Zhang , Hongyu Zhang , Beijun Shen , Xiaodong Gu