Related papers: Multilingual BERT Post-Pretraining Alignment

Multilingual Alignment of Contextual Word Representations

We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits…

Computation and Language · Computer Science 2020-02-14 Steven Cao , Nikita Kitaev , Dan Klein

Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study

Multilingual BERT (mBERT) has shown reasonable capability for zero-shot cross-lingual transfer when fine-tuned on downstream tasks. Since mBERT is not pre-trained with explicit cross-lingual supervision, transfer performance can further be…

Computation and Language · Computer Science 2020-10-01 Saurabh Kulshreshtha , José Luis Redondo-García , Ching-Yun Chang

The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer

Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks. Cao et al. (2020) proposed a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a…

Computation and Language · Computer Science 2023-11-01 Pavel Efimov , Leonid Boytsov , Elena Arslanova , Pavel Braslavski

Multi-Level Contrastive Learning for Cross-Lingual Alignment

Cross-language pre-trained models such as multilingual BERT (mBERT) have achieved significant performance in various cross-lingual downstream NLP tasks. This paper proposes a multi-level contrastive learning (ML-CTL) framework to further…

Computation and Language · Computer Science 2022-03-01 Beiduo Chen , Wu Guo , Bin Gu , Quan Liu , Yongchao Wang

Bilingual Alignment Pre-Training for Zero-Shot Cross-Lingual Transfer

Multilingual pre-trained models have achieved remarkable performance on cross-lingual transfer learning. Some multilingual models such as mBERT, have been pre-trained on unlabeled corpora, therefore the embeddings of different languages in…

Computation and Language · Computer Science 2021-11-29 Ziqing Yang , Wentao Ma , Yiming Cui , Jiani Ye , Wanxiang Che , Shijin Wang

On the Language Neutrality of Pre-trained Multilingual Representations

Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on…

Computation and Language · Computer Science 2020-10-01 Jindřich Libovický , Rudolf Rosa , Alexander Fraser

Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment

Multilingual pretraining typically lacks explicit alignment signals, leading to suboptimal cross-lingual alignment in the representation space. In this work, we show that training standard pretrained models for cross-lingual alignment with…

Computation and Language · Computer Science 2026-02-26 Barah Fazili , Koustava Goswami

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

Multi-lingual contextualized embeddings, such as multilingual-BERT (mBERT), have shown success in a variety of zero-shot cross-lingual tasks. However, these models are limited by having inconsistent contextualized representations of…

Computation and Language · Computer Science 2020-07-14 Libo Qin , Minheng Ni , Yue Zhang , Wanxiang Che

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages.…

Computation and Language · Computer Science 2021-09-13 Kuan-Hao Huang , Wasi Uddin Ahmad , Nanyun Peng , Kai-Wei Chang

A Primer on Pretrained Multilingual Language Models

Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there…

Computation and Language · Computer Science 2021-12-24 Sumanth Doddapaneni , Gowtham Ramesh , Mitesh M. Khapra , Anoop Kunchukuttan , Pratyush Kumar

Language-agnostic BERT Sentence Embedding

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.…

Computation and Language · Computer Science 2022-03-09 Fangxiaoyu Feng , Yinfei Yang , Daniel Cer , Naveen Arivazhagan , Wei Wang

Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing

This paper investigates the problem of learning cross-lingual representations in a contextual space. We propose Cross-Lingual BERT Transformation (CLBT), a simple and efficient approach to generate cross-lingual contextualized word…

Computation and Language · Computer Science 2019-09-17 Yuxuan Wang , Wanxiang Che , Jiang Guo , Yijia Liu , Ting Liu

A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages. To disengage from these dependencies, researchers have explored training…

Computation and Language · Computer Science 2022-10-19 Kunbo Ding , Weijie Liu , Yuejian Fang , Weiquan Mao , Zhe Zhao , Tao Zhu , Haoyan Liu , Rong Tian , Yiren Chen

Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer. However, due to the constant model capacity, multilingual pre-training usually lags behind the monolingual…

Computation and Language · Computer Science 2019-11-12 Zewen Chi , Li Dong , Furu Wei , Xian-Ling Mao , Heyan Huang

MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models

Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused…

Computation and Language · Computer Science 2019-11-15 Itzik Malkiel , Lior Wolf

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they…

Computation and Language · Computer Science 2022-11-17 Henry Tang , Ameet Deshpande , Karthik Narasimhan

Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Cross-lingual word embeddings (CLWE) have been proven useful in many cross-lingual tasks. However, most existing approaches to learn CLWE including the ones with contextual embeddings are sense agnostic. In this work, we propose a novel…

Computation and Language · Computer Science 2022-09-16 Linlin Liu , Thien Hai Nguyen , Shafiq Joty , Lidong Bing , Luo Si

Hierarchical Multitask Learning Approach for BERT

Recent works show that learning contextualized embeddings for words is beneficial for downstream tasks. BERT is one successful example of this approach. It learns embeddings by solving two tasks, which are masked language model (masked LM)…

Computation and Language · Computer Science 2020-11-10 Çağla Aksoy , Alper Ahmetoğlu , Tunga Güngör

Lingua Custodi's participation at the WMT 2025 Terminology shared task

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning BERT based cross-lingual sentence embeddings have yet to be explored. We systematically investigate…

Computation and Language · Computer Science 2025-10-21 Jingshu Liu , Raheel Qader , Gaëtan Caillaut , Mariam Nakhlé

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and…

Computation and Language · Computer Science 2020-10-06 Zihan Liu , Genta Indra Winata , Andrea Madotto , Pascale Fung