English
Related papers

Related papers: Thinking Augmented Pre-training

200 papers

Large Language Models (LLMs) are pivotal in advancing natural language processing but often struggle with complex reasoning tasks due to inefficient attention distributions. In this paper, we explore the effect of increased computed tokens…

Computation and Language · Computer Science 2024-06-25 Bingli Liao , Danilo Vasconcellos Vargas

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

The growing disparity between the exponential scaling of computational resources and the finite growth of high-quality text data now constrains conventional scaling approaches for large language models (LLMs). To address this challenge, we…

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely…

Computation and Language · Computer Science 2024-08-08 Xinyi Wang , Lucas Caccia , Oleksiy Ostapenko , Xingdi Yuan , William Yang Wang , Alessandro Sordoni

Large Language Models (LLMs) have demonstrated significant improvements in reasoning capabilities through supervised fine-tuning and reinforcement learning. However, when training reasoning models, these approaches are primarily applicable…

Computation and Language · Computer Science 2025-05-16 Yoichi Ishibashi , Taro Yano , Masafumi Oyamada

Scaling large language models by increasing parameters and training data is increasingly constrained by limited high-quality corpora and rising communication costs. This work explores an alternative axis: increasing per-token computation…

Computation and Language · Computer Science 2026-03-11 Boyi Zeng , Yiqin Hao , He Li , Shixiang Song , Feichen Song , Zitong Wang , Siyuan Huang , Yi Xu , ZiWei He , Xinbing Wang , Zhouhan Lin

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it…

Computation and Language · Computer Science 2025-06-10 Qingxiu Dong , Li Dong , Yao Tang , Tianzhu Ye , Yutao Sun , Zhifang Sui , Furu Wei

Test-Time Scaling (TTS) improves the reasoning performance of Large Language Models (LLMs) by allocating additional compute during inference. We conduct a structured survey of TTS methods and categorize them into sampling-based,…

Computation and Language · Computer Science 2025-06-06 Ho-Lam Chung , Teng-Yun Hsiao , Hsiao-Ying Huang , Chunerh Cho , Jian-Ren Lin , Zhang Ziwei , Yun-Nung Chen

The development of state-of-the-art large language models is commonly understood as a two-stage process involving pre-training and post-training. We point out the need for an additional intermediate stage called reinforcement mid-training…

Computation and Language · Computer Science 2025-09-30 Yijun Tian , Shaoyu Chen , Zhichao Xu , Yawei Wang , Jinhe Bi , Peng Han , Wei Wang

Large language models (LLMs) have shown limitations in tasks requiring complex logical reasoning and multi-step problem-solving. To address these challenges, researchers have employed carefully designed prompts and flowcharts, simulating…

Computation and Language · Computer Science 2024-12-06 Changcheng Li , Xiangyu Wang , Qiuju Chen , Xiren Zhou , Huanhuan Chen

The evolving sophistication and intricacies of Large Language Models (LLMs) yield unprecedented advancements, yet they simultaneously demand considerable computational resources and incur significant costs. To alleviate these challenges,…

Computation and Language · Computer Science 2023-10-03 Hongye Jin , Xiaotian Han , Jingfeng Yang , Zhimeng Jiang , Chia-Yuan Chang , Xia Hu

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has…

Computation and Language · Computer Science 2023-06-14 Zhengxiang Shi , Aldo Lipani

Since the inception of Large Language Models (LLMs), the quest to efficiently train them for superior reasoning capabilities has been a pivotal challenge. The dominant training paradigm for LLMs is based on next token prediction (NTP).…

Computation and Language · Computer Science 2025-02-21 Pengxiao Lin , Zhongwang Zhang , Zhi-Qin John Xu

Recent advancements in large language models (LLMs) have demonstrated that progressive refinement, rather than providing a single answer, results in more accurate and thoughtful outputs. However, existing methods often rely heavily on…

Computation and Language · Computer Science 2024-10-18 Chengyu Du , Jinyi Han , Yizhou Ying , Aili Chen , Qianyu He , Haokun Zhao , Sirui Xia , Haoran Guo , Jiaqing Liang , Zulong Chen , Liangyue Li , Yanghua Xiao

Large language models (LLMs) excel at complex tasks thanks to advances in their reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and efficiency, often encouraging unnecessarily long…

Machine Learning · Computer Science 2025-10-16 Jingyao Wang , Wenwen Qiang , Zeen Song , Changwen Zheng , Hui Xiong

As large language models (LLMs) become increasingly powerful, the sequential nature of autoregressive generation creates a fundamental throughput bottleneck that limits the practical deployment. While Multi-Token Prediction (MTP) has…

Machine Learning · Computer Science 2025-09-24 Yuxuan Cai , Xiaozhuan Liang , Xinghua Wang , Jin Ma , Haijin Liang , Jinwen Luo , Xinyu Zuo , Lisheng Duan , Yuyang Yin , Xi Chen

Eliciting explicit, step-by-step reasoning traces from large language models (LLMs) has emerged as a dominant paradigm for enhancing model capabilities. Although such reasoning strategies were originally designed for problems requiring…

Computation and Language · Computer Science 2026-03-23 Xinyu Guo , Yazhou Zhang , Jing Qin

Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training…

Computation and Language · Computer Science 2026-05-20 Bowen Peng , Théo Gigant , Jeffrey Quesnelle

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted…

Computation and Language · Computer Science 2024-01-17 Shibo Hao , Tianyang Liu , Zhen Wang , Zhiting Hu
‹ Prev 1 2 3 10 Next ›