Related papers: Continual Learning Under Language Shift

An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models

The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established. However, phenomena of positive or negative transfer, and the effect of language choice still need to be fully…

Computation and Language · Computer Science 2024-04-01 Fahim Faisal , Antonios Anastasopoulos

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

Most Transformer language models are primarily pretrained on English text, limiting their use for other languages. As the model sizes grow, the performance gap between English and other languages with fewer compute and data resources…

Computation and Language · Computer Science 2023-01-24 Malte Ostendorff , Georg Rehm

Vocabulary shapes cross-lingual variation of word-order learnability in language models

Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe…

Computation and Language · Computer Science 2026-03-23 Jonas Mayer Martins , Jaap Jumelet , Viola Priesemann , Lisa Beinborn

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models

Transfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear how this approach should be applied for unseen languages that…

Computation and Language · Computer Science 2021-04-20 Benjamin Muller , Antonis Anastasopoulos , Benoît Sagot , Djamé Seddah

Cross-lingual Transfer of Monolingual Models

Recent studies in zero-shot cross-lingual learning using multilingual models have falsified the previous hypothesis that shared vocabulary and joint pre-training are the keys to cross-lingual generalization. Inspired by this advancement, we…

Computation and Language · Computer Science 2022-05-20 Evangelia Gogoulou , Ariel Ekgren , Tim Isbister , Magnus Sahlgren

Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models

English pretrained language models, which make up the backbone of many modern NLP systems, require huge amounts of unlabeled training data. These models are generally presented as being trained only on English text but have been found to…

Computation and Language · Computer Science 2022-11-18 Terra Blevins , Luke Zettlemoyer

Measuring Cross-lingual Transfer in Bytes

Multilingual pretraining has been a successful solution to the challenges posed by the lack of resources for languages. These models can transfer knowledge to target languages with minimal or no examples. Recent research suggests that…

Computation and Language · Computer Science 2024-04-15 Leandro Rodrigues de Souza , Thales Sales Almeida , Roberto Lotufo , Rodrigo Nogueira

Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability

Pretrained multilingual models enable zero-shot learning even for unseen languages, and that performance can be further improved via adaptation prior to finetuning. However, it is unclear how the number of pretraining languages influences a…

Computation and Language · Computer Science 2022-03-22 Yoshinari Fujinuma , Jordan Boyd-Graber , Katharina Kann

Is It Worth the (Environmental) Cost? Limited Evidence for Temporal Adaptation via Continuous Training

Language is constantly changing and evolving, leaving language models to become quickly outdated. Consequently, we should continuously update our models with new data to expose them to new events and facts. However, that requires additional…

Computation and Language · Computer Science 2023-05-05 Giuseppe Attanasio , Debora Nozza , Federico Bianchi , Dirk Hovy

On the Acquisition of Shared Grammatical Representations in Bilingual Language Models

Crosslingual transfer is crucial to contemporary language models' multilingual capabilities, but how it occurs is not well understood. We ask what happens to a monolingual language model when it begins to be trained on a second language.…

Computation and Language · Computer Science 2025-06-05 Catherine Arnett , Tyler A. Chang , James A. Michaelov , Benjamin K. Bergen

Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis

Sentiment analysis (SA) systems are widely deployed in many of the world's languages, and there is well-documented evidence of demographic bias in these systems. In languages beyond English, scarcer training data is often supplemented with…

Computation and Language · Computer Science 2023-05-23 Seraphina Goldfarb-Tarrant , Björn Ross , Adam Lopez

Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks

Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this…

Computation and Language · Computer Science 2022-11-02 Rochelle Choenni , Dan Garrette , Ekaterina Shutova

Investigating Continual Pretraining in Large Language Models: Insights and Implications

Continual learning (CL) in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies to adapt models to emerging knowledge and achieve robustness in dynamic environments. Our…

Computation and Language · Computer Science 2025-02-13 Çağatay Yıldız , Nishaanth Kanna Ravichandran , Nitin Sharma , Matthias Bethge , Beyza Ermis

Towards Robust and Efficient Continual Language Learning

As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue…

Computation and Language · Computer Science 2023-07-13 Adam Fisch , Amal Rannen-Triki , Razvan Pascanu , Jörg Bornschein , Angeliki Lazaridou , Elena Gribovskaya , Marc'Aurelio Ranzato

A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics

In order for large language models to be useful across the globe, they are fine-tuned to follow instructions on multilingual data. Despite the ubiquity of such post-training, a clear understanding of the dynamics that enable cross-lingual…

Computation and Language · Computer Science 2025-04-24 Luisa Shimabucoro , Ahmet Ustun , Marzieh Fadaee , Sebastian Ruder

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

Multilingual large language models are designed, claimed, and expected to cater to speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may not perfectly align with this…

Computation and Language · Computer Science 2024-09-27 Pinzhen Chen , Simon Yu , Zhicheng Guo , Barry Haddow

Understanding Data Temporality Impact on Large Language Models Pre-training

Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training…

Computation and Language · Computer Science 2026-05-26 Hippolyte Pilchen , Romain Fabre , Franck Signe Talla , Patrick Perez , Edouard Grave

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and…

Computation and Language · Computer Science 2025-05-28 Michael Y. Hu , Jackson Petty , Chuan Shi , William Merrill , Tal Linzen

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about…

Computation and Language · Computer Science 2022-10-25 Terra Blevins , Hila Gonen , Luke Zettlemoyer

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

When we transfer a pretrained language model to a new language, there are many axes of variation that change at once. To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of…

Computation and Language · Computer Science 2024-01-25 Zhengxuan Wu , Alex Tamkin , Isabel Papadimitriou