Related papers: BERTGEN: Multi-task Generation through BERT

What BERT Sees: Cross-Modal Transfer for Visual Question Generation

Pre-trained language models have recently contributed to significant advances in NLP tasks. Recently, multi-modal versions of BERT have been developed, using heavy pre-training relying on vast corpora of aligned textual and image data,…

Computation and Language · Computer Science 2020-12-17 Thomas Scialom , Patrick Bordes , Paul-Alexis Dray , Jacopo Staiano , Patrick Gallinari

Is Multilingual BERT Fluent in Language Generation?

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic…

Computation and Language · Computer Science 2019-10-10 Samuel Rönnqvist , Jenna Kanerva , Tapio Salakoski , Filip Ginter

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Multilingual pretrained language models have demonstrated remarkable zero-shot cross-lingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen…

Computation and Language · Computer Science 2021-01-28 Benjamin Muller , Yanai Elazar , Benoît Sagot , Djamé Seddah

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

BERTje: A Dutch BERT Model

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual…

Computation and Language · Computer Science 2019-12-23 Wietse de Vries , Andreas van Cranenburgh , Arianna Bisazza , Tommaso Caselli , Gertjan van Noord , Malvina Nissim

lamBERT: Language and Action Learning Using Multimodal BERT

Recently, the bidirectional encoder representations from transformers (BERT) model has attracted much attention in the field of natural language processing, owing to its high performance in language understanding-related tasks. The BERT…

Machine Learning · Computer Science 2020-04-16 Kazuki Miyazawa , Tatsuya Aoki , Takato Horii , Takayuki Nagai

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104…

Computation and Language · Computer Science 2019-10-04 Shijie Wu , Mark Dredze

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities and performs surprisingly well on zero-shot POS tagging and Named Entity Recognition (NER),…

Computation and Language · Computer Science 2022-05-18 Beiduo Chen , Wu Guo , Quan Liu , Kun Tao

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Jiasen Lu , Dhruv Batra , Devi Parikh , Stefan Lee

Distilling Knowledge Learned in BERT for Text Generation

Large-scale pre-trained language model such as BERT has achieved great success in language understanding tasks. However, it remains an open question how to utilize BERT for language generation. In this paper, we present a novel approach,…

Computation and Language · Computer Science 2020-07-21 Yen-Chun Chen , Zhe Gan , Yu Cheng , Jingzhou Liu , Jingjing Liu

Multilingual is not enough: BERT for Finnish

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model…

Computation and Language · Computer Science 2019-12-17 Antti Virtanen , Jenna Kanerva , Rami Ilo , Jouni Luoma , Juhani Luotolahti , Tapio Salakoski , Filip Ginter , Sampo Pyysalo

Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

In this study, we implement a novel BERT architecture for multitask fine-tuning on three downstream tasks: sentiment classification, paraphrase detection, and semantic textual similarity prediction. Our model, Multitask BERT, incorporates…

Computation and Language · Computer Science 2024-08-29 Christopher Sun , Abishek Satish

How multilingual is Multilingual BERT?

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in…

Computation and Language · Computer Science 2019-06-05 Telmo Pires , Eva Schlinger , Dan Garrette

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Encoder-only languages models are frequently used for a variety of standard machine learning tasks, including classification and retrieval. However, there has been a lack of recent research for encoder models, especially with respect to…

Computation and Language · Computer Science 2025-09-09 Marc Marone , Orion Weller , William Fleshman , Eugene Yang , Dawn Lawrie , Benjamin Van Durme

An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining

Multi-task learning (MTL) has achieved remarkable success in natural language processing applications. In this work, we study a multi-task learning model with multiple decoders on varieties of biomedical and clinical natural language…

Computation and Language · Computer Science 2020-05-07 Yifan Peng , Qingyu Chen , Zhiyong Lu

What the [MASK]? Making Sense of Language-Specific BERT Models

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT…

Computation and Language · Computer Science 2020-03-09 Debora Nozza , Federico Bianchi , Dirk Hovy

Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Generation

For a computer to naturally interact with a human, it needs to be human-like. In this paper, we propose a neural response generation model with multi-task learning of generation and classification, focusing on emotion. Our model based on…

Computation and Language · Computer Science 2021-05-26 Tatsuya Ide , Daisuke Kawahara

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the…

Computation and Language · Computer Science 2022-12-02 Zhengfu He , Tianxiang Sun , Kuanning Wang , Xuanjing Huang , Xipeng Qiu

InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining

Multi-modal pretraining for learning high-level multi-modal representation is a further step towards deep learning and artificial intelligence. In this work, we propose a novel model, namely InterBERT (BERT for Interaction), which is the…

Computation and Language · Computer Science 2021-04-23 Junyang Lin , An Yang , Yichang Zhang , Jie Liu , Jingren Zhou , Hongxia Yang

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of tasks. To alleviate this…

Computation and Language · Computer Science 2023-03-03 Mingxu Tao , Yansong Feng , Dongyan Zhao