Related papers: A Flexible Multi-Task Model for BERT Serving

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used…

Machine Learning · Computer Science 2019-05-16 Asa Cooper Stickland , Iain Murray

TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the…

Computation and Language · Computer Science 2020-12-07 Daniel Grießhaber , Johannes Maucher , Ngoc Thang Vu

MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning

Clinical notes contain an abundance of important but not-readily accessible information about patients. Systems to automatically extract this information rely on large amounts of training data for which their exists limited resources to…

Computation and Language · Computer Science 2020-04-23 Andriy Mulyar , Bridget T. McInnes

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks…

Computation and Language · Computer Science 2020-02-18 Jinhua Zhu , Yingce Xia , Lijun Wu , Di He , Tao Qin , Wengang Zhou , Houqiang Li , Tie-Yan Liu

Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup

Pre-trained language models, such as BERT, have achieved significant accuracy gain in many natural language processing tasks. Despite its effectiveness, the huge number of parameters makes training a BERT model computationally very…

Computation and Language · Computer Science 2020-11-30 Cheng Yang , Shengnan Wang , Chao Yang , Yuechuan Li , Ru He , Jingqiao Zhang

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However,…

Computation and Language · Computer Science 2023-09-01 Souvik Kundu , Sharath Nittur Sridhar , Maciej Szankin , Sairam Sundaresan

BERT4beam: Large AI Model Enabled Generalized Beamforming Optimization

Artificial intelligence (AI) is anticipated to emerge as a pivotal enabler for the forthcoming sixth-generation (6G) wireless communication systems. However, current research efforts regarding large AI models for wireless communications…

Systems and Control · Electrical Eng. & Systems 2025-09-16 Yuhang Li , Yang Lu , Wei Chen , Bo Ai , Zhiguo Ding , Dusit Niyato

Investigating Transferability in Pretrained Language Models

How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This method, partial reinitialization, involves replacing…

Computation and Language · Computer Science 2020-11-11 Alex Tamkin , Trisha Singh , Davide Giovanardi , Noah Goodman

An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining

Multi-task learning (MTL) has achieved remarkable success in natural language processing applications. In this work, we study a multi-task learning model with multiple decoders on varieties of biomedical and clinical natural language…

Computation and Language · Computer Science 2020-05-07 Yifan Peng , Qingyu Chen , Zhiyong Lu

Optimizing Multi-Task Learning for Enhanced Performance in Large Language Models

This study aims to explore the performance improvement method of large language models based on GPT-4 under the multi-task learning framework and conducts experiments on two tasks: text classification and automatic summary generation.…

Computation and Language · Computer Science 2024-12-10 Zhen Qi , Jiajing Chen , Shuo Wang , Bingying Liu , Hongye Zheng , Chihang Wang

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and…

Computation and Language · Computer Science 2020-05-26 Han He , Jinho D. Choi

ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments

Peer assessment has been widely applied across diverse academic fields over the last few decades and has demonstrated its effectiveness. However, the advantages of peer assessment can only be achieved with high-quality peer reviews.…

Computation and Language · Computer Science 2021-10-11 Qinjin Jia , Jialin Cui , Yunkai Xiao , Chengyuan Liu , Parvez Rashid , Edward F. Gehringer

An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning

To enable the pre-trained models to be fine-tuned with local data on edge devices without sharing data with the cloud, we design an efficient split fine-tuning (SFT) framework for edge and cloud collaborative learning. We propose three…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-01 Shaohuai Shi , Qing Yang , Yang Xiang , Shuhan Qi , Xuan Wang

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Pre-trained language models have shown remarkable results on various NLP tasks. Nevertheless, due to their bulky size and slow inference speed, it is hard to deploy them on edge devices. In this paper, we have a critical insight that…

Computation and Language · Computer Science 2021-09-17 Chenhe Dong , Guangrun Wang , Hang Xu , Jiefeng Peng , Xiaozhe Ren , Xiaodan Liang

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking

Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations. Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness and handle…

Computation and Language · Computer Science 2022-07-05 Eleftherios Kapelonis , Efthymios Georgiou , Alexandros Potamianos

Fisher Mask Nodes for Language Model Merging

Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of…

Computation and Language · Computer Science 2024-05-06 Thennal D K , Ganesh Nathan , Suchithra M S

An Automated Knowledge Mining and Document Classification System with Multi-model Transfer Learning

Service manual documents are crucial to the engineering company as they provide guidelines and knowledge to service engineers. However, it has become inconvenient and inefficient for service engineers to retrieve specific knowledge from…

Computation and Language · Computer Science 2021-06-25 Jia Wei Chong , Zhiyuan Chen , Mei Shin Oh

Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling

Ensembling BERT models often significantly improves accuracy, but at the cost of significantly more computation and memory footprint. In this work, we propose Multi-CLS BERT, a novel ensembling method for CLS-based prediction tasks that is…

Computation and Language · Computer Science 2023-05-23 Haw-Shiuan Chang , Ruei-Yao Sun , Kathryn Ricci , Andrew McCallum