Related papers: Small and Practical BERT Models for Sequence Label…

Are All Languages Created Equal in Multilingual BERT?

Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals. However, these evaluations have focused on cross-lingual transfer…

Computation and Language · Computer Science 2020-10-02 Shijie Wu , Mark Dredze

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Language-agnostic BERT Sentence Embedding

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.…

Computation and Language · Computer Science 2022-03-09 Fangxiaoyu Feng , Yinfei Yang , Daniel Cer , Naveen Arivazhagan , Wei Wang

Load What You Need: Smaller Versions of Multilingual BERT

Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications.…

Computation and Language · Computer Science 2020-10-13 Amine Abdaoui , Camille Pradel , Grégoire Sigel

Few-shot learning for sentence pair classification and its applications in software engineering

Few-shot learning-the ability to train models with access to limited data-has become increasingly popular in the natural language processing (NLP) domain, as large language models such as GPT and T0 have been empirically shown to achieve…

Software Engineering · Computer Science 2023-06-16 Robert Kraig Helmeczi , Mucahit Cevik , Savas Yıldırım

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training…

Computation and Language · Computer Science 2023-01-06 Luke Gessler , Amir Zeldes

Optimizing small BERTs trained for German NER

Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a…

Computation and Language · Computer Science 2021-11-02 Jochen Zöllner , Konrad Sperfeld , Christoph Wick , Roger Labahn

Is Multilingual BERT Fluent in Language Generation?

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic…

Computation and Language · Computer Science 2019-10-10 Samuel Rönnqvist , Jenna Kanerva , Tapio Salakoski , Filip Ginter

Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties…

Computation and Language · Computer Science 2022-06-22 Ethan C. Chau , Lucy H. Lin , Noah A. Smith

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Large Transformer-based language models such as BERT have led to broad performance improvements on many NLP tasks. Domain-specific variants of these models have demonstrated excellent performance on a variety of specialised tasks. In legal…

Computation and Language · Computer Science 2021-09-16 Benjamin Clavié , Akshita Gheewala , Paul Briton , Marc Alphonsus , Rym Laabiyad , Francesco Piccoli

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this…

Computation and Language · Computer Science 2021-08-03 Wietse de Vries , Martijn Bartelds , Malvina Nissim , Martijn Wieling

Sequence-based Multi-lingual Low Resource Speech Recognition

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are…

Computation and Language · Computer Science 2018-03-08 Siddharth Dalmia , Ramon Sanabria , Florian Metze , Alan W. Black

Real-Time Execution of Large-scale Language Models on Mobile

Pre-trained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the…

Computation and Language · Computer Science 2020-10-23 Wei Niu , Zhenglun Kong , Geng Yuan , Weiwen Jiang , Jiexiong Guan , Caiwen Ding , Pu Zhao , Sijia Liu , Bin Ren , Yanzhi Wang

Effectiveness of Pre-training for Few-shot Intent Classification

This paper investigates the effectiveness of pre-training for few-shot intent classification. While existing paradigms commonly further pre-train language models such as BERT on a vast amount of unlabeled corpus, we find it highly effective…

Computation and Language · Computer Science 2024-09-17 Haode Zhang , Yuwei Zhang , Li-Ming Zhan , Jiaxin Chen , Guangyuan Shi , Albert Y. S. Lam , Xiao-Ming Wu

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and…

Computation and Language · Computer Science 2020-05-26 Han He , Jinho D. Choi

bert2BERT: Towards Reusable Pretrained Language Models

In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources and most of the models are trained from…

Computation and Language · Computer Science 2021-10-15 Cheng Chen , Yichun Yin , Lifeng Shang , Xin Jiang , Yujia Qin , Fengyu Wang , Zhi Wang , Xiao Chen , Zhiyuan Liu , Qun Liu

Enhancing Multilingual Language Models for Code-Switched Input Data

Code-switching, or alternating between languages within a single conversation, presents challenges for multilingual language models on NLP tasks. This research investigates if pre-training Multilingual BERT (mBERT) on code-switched datasets…

Computation and Language · Computer Science 2025-03-12 Katherine Xie , Nitya Babbar , Vicky Chen , Yoanna Turura

LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation

The multilingual pre-trained language models (e.g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks. However, these models are computationally intensive and difficult to be…

Computation and Language · Computer Science 2021-03-12 Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , Qun Liu

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While…

Computation and Language · Computer Science 2021-02-09 Philipp Dufter , Hinrich Schütze

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…

Computation and Language · Computer Science 2020-02-06 Chi Sun , Xipeng Qiu , Yige Xu , Xuanjing Huang