Related papers: AutoBERT-Zero: Evolving BERT Backbone from Scratch

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in…

Machine Learning · Computer Science 2024-08-27 Aaron Klein , Jacek Golebiowski , Xingchen Ma , Valerio Perrone , Cedric Archambeau

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g., the hidden dimension is a quarter of the intermediate dimension…

Computation and Language · Computer Science 2021-07-30 Yichun Yin , Cheng Chen , Lifeng Shang , Xin Jiang , Xiao Chen , Qun Liu

LV-BERT: Exploiting Layer Variety for BERT

Modern pre-trained language models are mostly built upon backbones stacking self-attention and feed-forward layers in an interleaved order. In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by…

Computation and Language · Computer Science 2021-06-28 Weihao Yu , Zihang Jiang , Fei Chen , Qibin Hou , Jiashi Feng

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them…

Computation and Language · Computer Science 2021-06-01 Jin Xu , Xu Tan , Renqian Luo , Kaitao Song , Jian Li , Tao Qin , Tie-Yan Liu

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large…

Computation and Language · Computer Science 2021-02-03 Zihang Jiang , Weihao Yu , Daquan Zhou , Yunpeng Chen , Jiashi Feng , Shuicheng Yan

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Although BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models, it seems that no consensus can be reached on what is the optimal architecture. Firstly, there are…

Computation and Language · Computer Science 2020-09-29 Wei Zhu , Xipeng Qiu , Yuan Ni , Guotong Xie

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely…

Machine Learning · Computer Science 2020-07-01 Esteban Real , Chen Liang , David R. So , Quoc V. Le

TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as…

Computation and Language · Computer Science 2019-09-30 Wei Wang , Bin Bi , Ming Yan , Chen Wu , Zuyi Bao , Jiangnan Xia , Liwei Peng , Luo Si

FlexiBERT: Are Current Transformer Architectures too Homogeneous and Rigid?

The existence of a plethora of language models makes the problem of selecting the best one for a custom task challenging. Most state-of-the-art methods leverage transformer-based models (e.g., BERT) or their variants. Training such models…

Machine Learning · Computer Science 2022-05-25 Shikhar Tuli , Bhishma Dedhia , Shreshth Tuli , Niraj K. Jha

SetBERT: Enhancing Retrieval Performance for Boolean Logic and Set Operation Queries

We introduce SetBERT, a fine-tuned BERT-based model designed to enhance query embeddings for set operations and Boolean logic queries, such as Intersection (AND), Difference (NOT), and Union (OR). SetBERT significantly improves retrieval…

Computation and Language · Computer Science 2024-06-27 Quan Mai , Susan Gauch , Douglas Adams

Self-supervised Machine Learning Based Approach to Orbit Modelling Applied to Space Traffic Management

This paper presents a novel methodology for improving the performance of machine learning based space traffic management tasks through the use of a pre-trained orbit model. Taking inspiration from BERT-like self-supervised language models…

Space Physics · Physics 2023-12-13 Emma Stevenson , Victor Rodriguez-Fernandez , Hodei Urrutxua , Vincent Morand , David Camacho

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code…

Computation and Language · Computer Science 2020-09-21 Zhangyin Feng , Daya Guo , Duyu Tang , Nan Duan , Xiaocheng Feng , Ming Gong , Linjun Shou , Bing Qin , Ting Liu , Daxin Jiang , Ming Zhou

Optimizing small BERTs trained for German NER

Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a…

Computation and Language · Computer Science 2021-11-02 Jochen Zöllner , Konrad Sperfeld , Christoph Wick , Roger Labahn

Neural Architecture Search for Sentence Classification with BERT

Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the…

Artificial Intelligence · Computer Science 2024-03-28 Philip Kenneweg , Sarah Schröder , Barbara Hammer

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different…

Computation and Language · Computer Science 2019-08-16 Yaru Hao , Li Dong , Furu Wei , Ke Xu

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment

Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research…

Computation and Language · Computer Science 2024-08-26 Kun Luo , Minghao Qin , Zheng Liu , Shitao Xiao , Jun Zhao , Kang Liu

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model…

Computation and Language · Computer Science 2023-05-29 Neal Lawton , Anoop Kumar , Govind Thattai , Aram Galstyan , Greg Ver Steeg