English
Related papers

Related papers: reStructured Pre-training

200 papers

The dominant paradigm for training large reasoning models starts with pre-training using next-token prediction loss on vast amounts of data. Reinforcement learning, while powerful in scaling reasoning, is introduced only as the very last…

In recent years, large language models (LLMs) achieve remarkable success across a variety of tasks. However, their potential in the domain of Automated Essay Scoring (AES) remains largely underexplored. Moreover, compared to English data,…

Computation and Language · Computer Science 2025-04-09 Yida Cai , Kun Liang , Sanwoo Lee , Qinghan Wang , Yunfang Wu

The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet, it is strikingly vulnerable to adversarial examples, e.g., word substitution attacks using only synonyms can easily fool a BERT-based sentiment…

Computation and Language · Computer Science 2021-12-23 Xinhsuai Dong , Luu Anh Tuan , Min Lin , Shuicheng Yan , Hanwang Zhang

The growing disparity between the exponential scaling of computational resources and the finite growth of high-quality text data now constrains conventional scaling approaches for large language models (LLMs). To address this challenge, we…

Pre-trained language models achieve outstanding performance in NLP tasks. Various knowledge distillation methods have been proposed to reduce the heavy computation and storage requirements of pre-trained language models. However, from our…

Computation and Language · Computer Science 2021-06-08 Xin Guo , Jianlei Yang , Haoyi Zhou , Xucheng Ye , Jianxin Li

Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training…

Computation and Language · Computer Science 2026-05-20 Bowen Peng , Théo Gigant , Jeffrey Quesnelle

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art…

Computation and Language · Computer Science 2019-11-12 Pengxiang Cheng , Katrin Erk

Continual post-training (CPT) is a popular and effective technique for adapting foundation models like multimodal large language models to specific and ever-evolving downstream tasks. While existing research has primarily concentrated on…

Machine Learning · Computer Science 2026-01-22 Song Lai , Haohan Zhao , Rong Feng , Changyi Ma , Wenzhuo Liu , Hongbo Zhao , Xi Lin , Dong Yi , Qingfu Zhang , Hongbin Liu , Gaofeng Meng , Fei Zhu

Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Mainly three forces are driving the improvements in this area of research: More elaborated architectures are…

Computation and Language · Computer Science 2020-07-22 Matthias Aßenmacher , Christian Heumann

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as…

Computation and Language · Computer Science 2019-09-30 Wei Wang , Bin Bi , Ming Yan , Chen Wu , Zuyi Bao , Jiangnan Xia , Liwei Peng , Luo Si

Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of…

Computation and Language · Computer Science 2023-05-23 Zhengxiang Shi , Francesco Tonolini , Nikolaos Aletras , Emine Yilmaz , Gabriella Kazai , Yunlong Jiao

While language models have shown remarkable performance across diverse tasks, they still encounter challenges in complex reasoning scenarios. Recent research suggests that language models trained on linearized search traces toward…

Artificial Intelligence · Computer Science 2025-10-28 Seungyong Moon , Bumsoo Park , Hyun Oh Song

Understanding the relationships between protein sequence, structure and function is a long-standing biological challenge with manifold implications from drug design to our understanding of evolution. Recently, protein language models have…

Quantitative Methods · Quantitative Biology 2024-01-29 Dexiong Chen , Philip Hartout , Paolo Pellizzoni , Carlos Oliver , Karsten Borgwardt

Large Language Models (LLMs) are increasingly relied upon for complex workflows, yet their ability to maintain flow of instructions remains underexplored. Existing benchmarks conflate task complexity with structural ordering, making it…

Artificial Intelligence · Computer Science 2026-01-28 Andrew Jaffe , Noah Reicin , Jinho D. Choi

The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained by its reliance on high-quality, human-annotated…

Machine Learning · Computer Science 2026-03-24 Yuanfu Wang , Zhixuan Liu , Xiangtian Li , Chaochao Lu , Chao Yang

Structured, procedural reasoning is essential for Large Language Models (LLMs), especially in mathematics. While post-training methods have improved LLM performance, they still fall short in capturing deep procedural logic on complex tasks.…

Artificial Intelligence · Computer Science 2025-08-27 Zhichao Yang , Zhaoxin Fan , Gen Li , Yuanze Hu , Xinyu Wang , Ye Qiu , Xin Wang , Yifan Sun , Wenjun Wu

With the success of down streaming task using English pre-trained language model, the pre-trained Chinese language model is also necessary to get a better performance of Chinese NLP task. Unlike the English language, Chinese has its special…

Computation and Language · Computer Science 2022-02-24 Chao Lv , Han Zhang , XinKai Du , Yunhao Zhang , Ying Huang , Wenhao Li , Jia Han , Shanshan Gu

Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this paradigm, its contributions remain…

Computation and Language · Computer Science 2026-02-24 Yinuo Xu , Shuo Lu , Jianjie Cheng , Meng Wang , Qianlong Xie , Xingxing Wang , Ran He , Jian Liang

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it…

Computation and Language · Computer Science 2025-06-10 Qingxiu Dong , Li Dong , Yao Tang , Tianzhu Ye , Yutao Sun , Zhifang Sui , Furu Wei

Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves…

‹ Prev 1 2 3 10 Next ›