English
Related papers

Related papers: Exploring Next Token Prediction For Optimizing Dat…

200 papers

Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable…

Since the inception of Large Language Models (LLMs), the quest to efficiently train them for superior reasoning capabilities has been a pivotal challenge. The dominant training paradigm for LLMs is based on next token prediction (NTP).…

Computation and Language · Computer Science 2025-02-21 Pengxiao Lin , Zhongwang Zhang , Zhi-Qin John Xu

The paradigm of Next Token Prediction (NTP) has driven the unprecedented success of Large Language Models (LLMs), but is also the source of their most persistent weaknesses such as poor long-term planning, error accumulation, and…

Computation and Language · Computer Science 2025-09-30 Charlie Wyatt , Aditya Joshi , Flora Salim

Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and…

Computation and Language · Computer Science 2024-09-02 Junhao Ruan , Abudukeyumu Abudula , Xinyu Liu , Bei Li , Yinqiao Li , Chenglong Wang , Yuchun Fan , Yuan Ge , Tong Xiao , Jingbo Zhu

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

The next-token prediction (NTP) objective has been foundational in the development of modern large language models (LLMs), driving advances in fluency and generalization. However, NTP operates at the \textit{token} level, treating…

Computation and Language · Computer Science 2026-01-23 Laya Iyer , Pranav Somani , Alice Guo , Dan Jurafsky , Chen Shani

Multi-token prediction (MTP) has been proposed as an auxiliary objective to improve next-token prediction (NTP) in language model training but shows inconsistent improvements, underperforming in standard NLP benchmarks. We found MTP's exact…

Machine Learning · Computer Science 2026-02-17 Zayd M. K. Zuhri , Erland Hilman Fuadi , Alham Fikri Aji

Large language models (LLMs) have achieved notable progress. Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to…

Computation and Language · Computer Science 2025-09-23 Xiaohao Liu , Xiaobo Xia , Weixiang Zhao , Manyi Zhang , Xianzhi Yu , Xiu Su , Shuo Yang , See-Kiong Ng , Tat-Seng Chua

Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead…

Computation and Language · Computer Science 2025-08-12 Charlie Wyatt , Aditya Joshi , Flora Salim

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence.…

Computation and Language · Computer Science 2025-07-24 Yizhou Wang , Lingzhi Zhang , Yue Bai , Mang Tik Chiu , Zhengmian Hu , Mingyuan Zhang , Qihua Dong , Yu Yin , Sohrab Amirghodsi , Yun Fu

We initiate an investigation into the optimization properties of next-token prediction (NTP), the dominant training paradigm for modern language models. Specifically, we study the structural properties of the solutions selected by…

Machine Learning · Computer Science 2024-11-01 Christos Thrampoulidis

To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The…

Computation and Language · Computer Science 2025-05-06 Zaifu Zhan , Rui Zhang

Language model (LM) decoding is based on the next-token prediction (NTP) probability distribution. For neural LMs (e.g., Transformer-based), NTP distribution is essentially a softmax-regularized dot product between an encoded input context…

Computation and Language · Computer Science 2024-10-04 Letian Peng , Chenyang An , Jingbo Shang

Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowing…

Computation and Language · Computer Science 2026-05-26 Xiangdong Zhang , Debing Zhang , Shaofeng Zhang , Xiaohan Qin , Yu Cheng , Junchi Yan

While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative,…

Machine Learning · Computer Science 2026-04-15 Jianhao Huang , Zhanpeng Zhou , Renqiu Xia , Baharan Mirzasoleiman , Weijie Su , Wei Huang

We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP…

Computation and Language · Computer Science 2025-10-09 Yize Zhao , Christos Thrampoulidis

We systematically investigate multi-token prediction (MTP) capabilities within LLMs pre-trained for next-token prediction (NTP). We first show that such models inherently possess MTP capabilities via numerical marginalization over…

Computation and Language · Computer Science 2025-02-14 Somesh Mehra , Javier Alonso Garcia , Lukas Mauch

Large Language Models (LLMs) have achieved impressive performance across diverse tasks but continue to struggle with learning transitive relations, a cornerstone for complex planning. To address this issue, we investigate the Multi-Token…

Artificial Intelligence · Computer Science 2025-09-30 Qimin Zhong , Hao Liao , Siwei Wang , Mingyang Zhou , Xiaoqun Wu , Rui Mao , Wei Chen

A natural language database interface (NLDB) can democratize data-driven insights for non-technical users. However, existing Text-to-SQL semantic parsers cannot achieve high enough accuracy in the cross-database setting to allow good…

Computation and Language · Computer Science 2021-06-09 Peng Xu , Wenjie Zi , Hamidreza Shahidi , Ákos Kádár , Keyi Tang , Wei Yang , Jawad Ateeq , Harsh Barot , Meidan Alon , Yanshuai Cao

With the rapid development of IT operations, it has become increasingly crucial to efficiently manage and analyze large volumes of data for practical applications. The techniques of Natural Language Processing (NLP) have shown remarkable…

‹ Prev 1 2 3 10 Next ›