English
Related papers

Related papers: Structure Inducing Pre-Training

200 papers

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this…

Computation and Language · Computer Science 2022-03-22 Yoav Levine , Noam Wies , Daniel Jannai , Dan Navon , Yedid Hoshen , Amnon Shashua

Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear…

Computation and Language · Computer Science 2023-06-01 Aaron Mueller , Tal Linzen

Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text…

Machine Learning · Computer Science 2023-09-12 Jiashu Pu , Shiwei Zhao , Ling Cheng , Yongzhu Chang , Runze Wu , Tangjie Lv , Rongsheng Zhang

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these…

Machine Learning · Computer Science 2025-10-24 Mayank Jobanputra , Yana Veitsman , Yash Sarrof , Aleksandra Bakalova , Vera Demberg , Ellie Pavlick , Michael Hahn

Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during…

Computation and Language · Computer Science 2025-12-09 Charlie Zhang , Graham Neubig , Xiang Yue

Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is…

Machine Learning · Computer Science 2022-05-20 Andrea Cossu , Tinne Tuytelaars , Antonio Carta , Lucia Passaro , Vincenzo Lomonaco , Davide Bacciu

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo

Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and…

Computation and Language · Computer Science 2025-05-28 Michael Y. Hu , Jackson Petty , Chuan Shi , William Merrill , Tal Linzen

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic…

Computation and Language · Computer Science 2023-03-07 Chenguang Wang , Xiao Liu , Zui Chen , Haoyun Hong , Jie Tang , Dawn Song

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. To this end, we pre-train different transformer-based masked language models on several corpora with certain…

Computation and Language · Computer Science 2020-12-23 Cheng-Han Chiang , Hung-yi Lee

Both humans and large language models are able to learn language without explicit structural supervision. What inductive biases make this learning possible? We address this fundamental cognitive question by leveraging transformer language…

Computation and Language · Computer Science 2023-10-31 Isabel Papadimitriou , Dan Jurafsky

Pre-training and self-training are two approaches to semi-supervised learning. The comparison between pre-training and self-training has been explored. However, the previous works led to confusing findings: self-training outperforms…

Computation and Language · Computer Science 2024-09-05 Yiheng Wang , Jiayu Lin , Zuoquan Lin

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Large Language Models (LLMs), trained on extensive web-scale corpora, have demonstrated remarkable abilities across diverse tasks, especially as they are scaled up. Nevertheless, even state-of-the-art models struggle in certain cases,…

Computation and Language · Computer Science 2025-01-16 Irina Bigoulaeva , Harish Tayyar Madabushi , Iryna Gurevych

Reinforcement learning (RL) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs). However, LLMs can respond very inconsistently to RL finetuning: some show substantial…

Machine Learning · Computer Science 2025-10-07 Zhepeng Cen , Yihang Yao , William Han , Zuxin Liu , Ding Zhao

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full…

Computation and Language · Computer Science 2023-03-14 Anh Nguyen , Nikos Karampatziakis , Weizhu Chen

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the…

Machine Learning · Computer Science 2025-05-29 Zachary Shinnick , Liangze Jiang , Hemanth Saratchandran , Anton van den Hengel , Damien Teney
‹ Prev 1 2 3 10 Next ›