Related papers: Exploring Next Token Prediction For Optimizing Dat…

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable…

Computation and Language · Computer Science 2024-12-31 Liang Chen , Zekun Wang , Shuhuai Ren , Lei Li , Haozhe Zhao , Yunshui Li , Zefan Cai , Hongcheng Guo , Lei Zhang , Yizhe Xiong , Yichi Zhang , Ruoyu Wu , Qingxiu Dong , Ge Zhang , Jian Yang , Lingwei Meng , Shujie Hu , Yulong Chen , Junyang Lin , Shuai Bai , Andreas Vlachos , Xu Tan , Minjia Zhang , Wen Xiao , Aaron Yee , Tianyu Liu , Baobao Chang

Reasoning Bias of Next Token Prediction Training

Since the inception of Large Language Models (LLMs), the quest to efficiently train them for superior reasoning capabilities has been a pivotal challenge. The dominant training paradigm for LLMs is based on next token prediction (NTP).…

Computation and Language · Computer Science 2025-02-21 Pengxiao Lin , Zhongwang Zhang , Zhi-Qin John Xu

Alternatives To Next Token Prediction In Text Generation -- A Survey

The paradigm of Next Token Prediction (NTP) has driven the unprecedented success of Large Language Models (LLMs), but is also the source of their most persistent weaknesses such as poor long-term planning, error accumulation, and…

Computation and Language · Computer Science 2025-09-30 Charlie Wyatt , Aditya Joshi , Flora Salim

NDP: Next Distribution Prediction as a More Broad Target

Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and…

Computation and Language · Computer Science 2024-09-02 Junhao Ruan , Abudukeyumu Abudula , Xinyu Liu , Bei Li , Yinqiao Li , Chenglong Wang , Yuchun Fan , Yuan Ge , Tong Xiao , Jingbo Zhu

Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

Beyond Tokens: Concept-Level Training Objectives for LLMs

The next-token prediction (NTP) objective has been foundational in the development of modern large language models (LLMs), driving advances in fluency and generalization. However, NTP operates at the \textit{token} level, treating…

Computation and Language · Computer Science 2026-01-23 Laya Iyer , Pranav Somani , Alice Guo , Dan Jurafsky , Chen Shani

Predicting the Order of Upcoming Tokens Improves Language Modeling

Multi-token prediction (MTP) has been proposed as an auxiliary objective to improve next-token prediction (NTP) in language model training but shows inconsistent improvements, underperforming in standard NLP benchmarks. We found MTP's exact…

Machine Learning · Computer Science 2026-02-17 Zayd M. K. Zuhri , Erland Hilman Fuadi , Alham Fikri Aji

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Large language models (LLMs) have achieved notable progress. Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to…

Computation and Language · Computer Science 2025-09-23 Xiaohao Liu , Xiaobo Xia , Weixiang Zhao , Manyi Zhang , Xianzhi Yu , Xiu Su , Shuo Yang , See-Kiong Ng , Tat-Seng Chua

What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead…

Computation and Language · Computer Science 2025-08-12 Charlie Wyatt , Aditya Joshi , Flora Salim

Cautious Next Token Prediction

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence.…

Computation and Language · Computer Science 2025-07-24 Yizhou Wang , Lingzhi Zhang , Yue Bai , Mang Tik Chiu , Zhengmian Hu , Mingyuan Zhang , Qihua Dong , Yu Yin , Sohrab Amirghodsi , Yun Fu

Implicit Optimization Bias of Next-Token Prediction in Linear Models

We initiate an investigation into the optimization properties of next-token prediction (NTP), the dominant training paradigm for modern language models. Specifically, we study the structural properties of the solutions selected by…

Machine Learning · Computer Science 2024-11-01 Christos Thrampoulidis

Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models

To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The…

Computation and Language · Computer Science 2025-05-06 Zaifu Zhan , Rui Zhang

Correlation and Navigation in the Vocabulary Key Representation Space of Language Models

Language model (LM) decoding is based on the next-token prediction (NTP) probability distribution. For neural LMs (e.g., Transformer-based), NTP distribution is essentially a softmax-regularized dot product between an encoded input context…

Computation and Language · Computer Science 2024-10-04 Letian Peng , Chenyang An , Jingbo Shang

NITP: Next Implicit Token Prediction for LLM Pre-training

Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowing…

Computation and Language · Computer Science 2026-05-26 Xiangdong Zhang , Debing Zhang , Shaofeng Zhang , Xiaohan Qin , Yu Cheng , Junchi Yan

How Transformers Learn to Plan via Multi-Token Prediction

While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative,…

Machine Learning · Computer Science 2026-04-15 Jianhao Huang , Zhanpeng Zhou , Renqiu Xia , Baharan Mirzasoleiman , Weijie Su , Wei Huang

Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations

We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP…

Computation and Language · Computer Science 2025-10-09 Yize Zhao , Christos Thrampoulidis

On multi-token prediction for efficient LLM inference

We systematically investigate multi-token prediction (MTP) capabilities within LLMs pre-trained for next-token prediction (NTP). We first show that such models inherently possess MTP capabilities via numerical marginalization over…

Computation and Language · Computer Science 2025-02-14 Somesh Mehra , Javier Alonso Garcia , Lukas Mauch

Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction

Large Language Models (LLMs) have achieved impressive performance across diverse tasks but continue to struggle with learning transitive relations, a cornerstone for complex planning. To address this issue, we investigate the Multi-Token…

Artificial Intelligence · Computer Science 2025-09-30 Qimin Zhong , Hao Liao , Siwei Wang , Mingyang Zhou , Xiaoqun Wu , Rui Mao , Wei Chen

Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface

A natural language database interface (NLDB) can democratize data-driven insights for non-technical users. However, existing Text-to-SQL semantic parsers cannot achieve high enough accuracy in the cross-database setting to allow good…

Computation and Language · Computer Science 2021-06-09 Peng Xu , Wenjie Zi , Hamidreza Shahidi , Ákos Kádár , Keyi Tang , Wei Yang , Jawad Ateeq , Harsh Barot , Meidan Alon , Yanshuai Cao

OWL: A Large Language Model for IT Operations

With the rapid development of IT operations, it has become increasingly crucial to efficiently manage and analyze large volumes of data for practical applications. The techniques of Natural Language Processing (NLP) have shown remarkable…

Computation and Language · Computer Science 2024-09-30 Hongcheng Guo , Jian Yang , Jiaheng Liu , Liqun Yang , Linzheng Chai , Jiaqi Bai , Junran Peng , Xiaorong Hu , Chao Chen , Dongfeng Zhang , Xu Shi , Tieqiao Zheng , Liangfan Zheng , Bo Zhang , Ke Xu , Zhoujun Li