Related papers: Curriculum learning for language modeling

Influence-driven Curriculum Learning for Pre-training on Limited Data

Curriculum learning, a training technique where data is presented to the model in order of example difficulty (e.g., from simpler to more complex documents), has shown limited success for pre-training language models. In this work, we…

Computation and Language · Computer Science 2025-09-29 Loris Schoenegger , Lukas Thoma , Terra Blevins , Benjamin Roth

Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding

Curriculum learning is a widely adopted training strategy in natural language processing (NLP), where models are exposed to examples organized by increasing difficulty to enhance learning efficiency and performance. However, most existing…

Computation and Language · Computer Science 2025-07-15 Qi Feng , Yihong Liu , Hinrich Schütze

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning

Curriculum learning-organizing training data from easy to hard-has improved efficiency across machine learning domains, yet remains underexplored for language model pretraining. We present the first systematic investigation of curriculum…

Computation and Language · Computer Science 2026-01-29 Yang Zhang , Amr Mohamed , Hadi Abdine , Guokan Shang , Michalis Vazirgiannis

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We…

Computation and Language · Computer Science 2018-11-05 Xuan Zhang , Gaurav Kumar , Huda Khayrallah , Kenton Murray , Jeremy Gwinnup , Marianna J Martindale , Paul McNamee , Kevin Duh , Marine Carpuat

A Survey on Curriculum Learning

Curriculum learning (CL) is a training strategy that trains a machine learning model from easier data to harder data, which imitates the meaningful learning order in human curricula. As an easy-to-use plug-in, the CL strategy has…

Machine Learning · Computer Science 2021-03-26 Xin Wang , Yudong Chen , Wenwu Zhu

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on…

Computation and Language · Computer Science 2019-03-01 Jason Phang , Thibault Févry , Samuel R. Bowman

Cup Curriculum: Curriculum Learning on Model Capacity

Curriculum learning (CL) aims to increase the performance of a learner on a given task by applying a specialized learning strategy. This strategy focuses on either the dataset, the task, or the model. There is little to no work analysing…

Machine Learning · Computer Science 2023-11-08 Luca Scharr , Vanessa Toborek

An Analytical Theory of Curriculum Learning in Teacher-Student Networks

In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate…

Machine Learning · Computer Science 2022-12-07 Luca Saglietti , Stefano Sarao Mannelli , Andrew Saxe

Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks

Curriculum Learning emphasizes the order of training instances in a computational learning setup. The core hypothesis is that simpler instances should be learned early as building blocks to learn more complex ones. Despite its usefulness,…

Computation and Language · Computer Science 2016-11-21 Volkan Cirik , Eduard Hovy , Louis-Philippe Morency

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the…

Computation and Language · Computer Science 2019-07-24 Alex Wang , Jan Hula , Patrick Xia , Raghavendra Pappagari , R. Thomas McCoy , Roma Patel , Najoung Kim , Ian Tenney , Yinghui Huang , Katherin Yu , Shuning Jin , Berlin Chen , Benjamin Van Durme , Edouard Grave , Ellie Pavlick , Samuel R. Bowman

Curriculum Learning for Small Code Language Models

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these…

Machine Learning · Computer Science 2024-07-16 Marwa Naïr , Kamel Yamani , Lynda Said Lhadj , Riyadh Baghdadi

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

It is common knowledge that the quantity and quality of the training data play a significant role in the creation of a good machine learning model. In this paper, we take it one step further and demonstrate that the way the training…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-12 Georgios Karakasidis , Tamás Grósz , Mikko Kurimo

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely…

Computation and Language · Computer Science 2026-04-20 Alexandra Dragomir , Florin Brad , Radu Tudor Ionescu

Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU

Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend with the aim of accelerating convergence and improving generalisability. Current approaches for Natural Language…

Computation and Language · Computer Science 2022-11-28 Fenia Christopoulou , Gerasimos Lampouras , Ignacio Iacobacci

What Kind of Language is Easy to Language-Model Under Curriculum Learning?

Many of the thousands of attested languages share common configurations of features, creating a spectrum from typologically very rare (e.g., object-verb-subject word order) or impossible languages to very common combinations of features…

Computation and Language · Computer Science 2026-04-30 Nadine El-Naggar , Tatsuki Kuribayashi , Ted Briscoe

Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking

Neural ranking models are traditionally trained on a series of random batches, sampled uniformly from the entire training set. Curriculum learning has recently been shown to improve neural models' effectiveness by sampling batches…

Information Retrieval · Computer Science 2019-12-19 Gustavo Penha , Claudia Hauff

Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training

For specialized domains, there is often not a wealth of data with which to train large machine learning models. In such limited data / compute settings, various methods exist aiming to $\textit{do more with less}$, such as finetuning from a…

Machine Learning · Computer Science 2024-10-22 Rohan Saha , Abrar Fahim , Alona Fyshe , Alex Murphy

Curriculum Learning: A Survey

Training machine learning models in a meaningful order, from the easy samples to the hard ones, using curriculum learning can provide performance improvements over the standard training approach based on random data shuffling, without any…

Machine Learning · Computer Science 2022-04-12 Petru Soviany , Radu Tudor Ionescu , Paolo Rota , Nicu Sebe

Beyond Shallow Heuristics: Leveraging Human Intuition for Curriculum Learning

Curriculum learning (CL) aims to improve training by presenting data from "easy" to "hard", yet defining and measuring linguistic difficulty remains an open challenge. We investigate whether human-curated simple language can serve as an…

Computation and Language · Computer Science 2025-08-28 Vanessa Toborek , Sebastian Müller , Tim Selbach , Tamás Horváth , Christian Bauckhage

Scaling LLM Pre-training with Vocabulary Curriculum

Modern language models rely on static vocabularies, fixed before pretraining, in contrast to the adaptive vocabulary acquisition observed in human language learning. To bridge this gap, we introduce vocabulary curriculum learning, an…

Computation and Language · Computer Science 2025-02-26 Fangyuan Yu