Related papers: Structure Inducing Pre-Training

Self-Improving Pretraining: using post-trained models to pretrain better models

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

Computation and Language · Computer Science 2026-04-07 Ellen Xiaoqing Tan , Jack Lanchantin , Shehzaad Dhuliawala , Danwei Li , Thao Nguyen , Jing Xu , Ping Yu , Ilia Kulikov , Sainbayar Sukhbaatar , Jason Weston , Xian Li , Olga Golovneva

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this…

Computation and Language · Computer Science 2022-03-22 Yoav Levine , Noam Wies , Daniel Jannai , Dan Navon , Yedid Hoshen , Amnon Shashua

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases

Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear…

Computation and Language · Computer Science 2023-06-01 Aaron Mueller , Tal Linzen

Examining the Effect of Pre-training on Time Series Classification

Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text…

Machine Learning · Computer Science 2023-09-12 Jiashu Pu , Shiwei Zhao , Ling Cheng , Yongzhu Chang , Runze Wu , Tangjie Lv , Rongsheng Zhang

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these…

Machine Learning · Computer Science 2025-10-24 Mayank Jobanputra , Yana Veitsman , Yash Sarrof , Aleksandra Bakalova , Vera Demberg , Ellie Pavlick , Michael Hahn

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during…

Computation and Language · Computer Science 2025-12-09 Charlie Zhang , Graham Neubig , Xiang Yue

Continual Pre-Training Mitigates Forgetting in Language and Vision

Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is…

Machine Learning · Computer Science 2022-05-20 Andrea Cossu , Tinne Tuytelaars , Antonio Carta , Lucia Passaro , Vincenzo Lomonaco , Davide Bacciu

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and…

Computation and Language · Computer Science 2025-05-28 Michael Y. Hu , Jackson Petty , Chuan Shi , William Merrill , Tal Linzen

DeepStruct: Pretraining of Language Models for Structure Prediction

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic…

Computation and Language · Computer Science 2023-03-07 Chenguang Wang , Xiao Liu , Zui Chen , Haoyun Hong , Jie Tang , Dawn Song

Pre-Training a Language Model Without Human Language

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. To this end, we pre-train different transformer-based masked language models on several corpora with certain…

Computation and Language · Computer Science 2020-12-23 Cheng-Han Chiang , Hung-yi Lee

Injecting structural hints: Using language models to study inductive biases in language learning

Both humans and large language models are able to learn language without explicit structural supervision. What inductive biases make this learning possible? We address this fundamental cognitive question by leveraging transformer language…

Computation and Language · Computer Science 2023-10-31 Isabel Papadimitriou , Dan Jurafsky

A Comparative Study of Pre-training and Self-training

Pre-training and self-training are two approaches to semi-supervised learning. The comparison between pre-training and self-training has been explored. However, the previous works led to confusing findings: self-training outperforms…

Computation and Language · Computer Science 2024-09-05 Yiheng Wang , Jiayu Lin , Zuoquan Lin

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities

Large Language Models (LLMs), trained on extensive web-scale corpora, have demonstrated remarkable abilities across diverse tasks, especially as they are scaled up. Nevertheless, even state-of-the-art models struggle in certain cases,…

Computation and Language · Computer Science 2025-01-16 Irina Bigoulaeva , Harish Tayyar Madabushi , Iryna Gurevych

Behavior Injection: Preparing Language Models for Reinforcement Learning

Reinforcement learning (RL) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs). However, LLMs can respond very inconsistently to RL finetuning: some show substantial…

Machine Learning · Computer Science 2025-10-07 Zhepeng Cen , Yihang Yao , William Han , Zuxin Liu , Ding Zhao

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full…

Computation and Language · Computer Science 2023-03-14 Anh Nguyen , Nikos Karampatziakis , Weizhu Chen

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Computation and Language · Computer Science 2024-06-25 Weijia Shi , Sewon Min , Maria Lomeli , Chunting Zhou , Margaret Li , Gergely Szilvasy , Rich James , Xi Victoria Lin , Noah A. Smith , Luke Zettlemoyer , Scott Yih , Mike Lewis

Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the…

Machine Learning · Computer Science 2025-05-29 Zachary Shinnick , Liangze Jiang , Hemanth Saratchandran , Anton van den Hengel , Damien Teney