Related papers: Efficient Sequence Packing without Cross-contamina…

Fast-dLLM v2: Efficient Block-Diffusion LLM

Autoregressive (AR) large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks, yet their inherent sequential decoding limits inference efficiency. In this work, we propose Fast-dLLM v2,…

Computation and Language · Computer Science 2025-10-01 Chengyue Wu , Hao Zhang , Shuchen Xue , Shizhe Diao , Yonggan Fu , Zhijian Liu , Pavlo Molchanov , Ping Luo , Song Han , Enze Xie

Scaling LLM Pre-training with Vocabulary Curriculum

Modern language models rely on static vocabularies, fixed before pretraining, in contrast to the adaptive vocabulary acquisition observed in human language learning. To bridge this gap, we introduce vocabulary curriculum learning, an…

Computation and Language · Computer Science 2025-02-26 Fangyuan Yu

Token Weighting for Long-Range Language Modeling

Many applications of large language models (LLMs) require long-context understanding, but models continue to struggle with such tasks. We hypothesize that conventional next-token prediction training could contribute to this, because each…

Computation and Language · Computer Science 2025-03-13 Falko Helm , Nico Daheim , Iryna Gurevych

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Large Speech Language Models (LSLMs) typically operate at high token rates (tokens/s) to ensure acoustic fidelity, yet this results in sequence lengths that far exceed the underlying semantic content, incurring prohibitive inference costs.…

Computation and Language · Computer Science 2026-04-09 Bajian Xiang , Tingwei Guo , Xuan Chen , Yang Han

LBPE: Long-token-first Tokenization to Improve Large Language Models

The prevalent use of Byte Pair Encoding (BPE) in Large Language Models (LLMs) facilitates robust handling of subword units and avoids issues of out-of-vocabulary words. Despite its success, a critical challenge persists: long tokens, rich…

Computation and Language · Computer Science 2024-11-11 Haoran Lian , Yizhe Xiong , Zijia Lin , Jianwei Niu , Shasha Mo , Hui Chen , Peng Liu , Guiguang Ding

Unlocking Full Efficiency of Token Filtering in Large Language Model Training

Token filtering has been proposed to enhance the utility of large language models (LLMs) by eliminating inconsequential tokens during training. While usingfewer tokens is expected to reduce computational workloads, existing methods have not…

Machine Learning · Computer Science 2026-03-20 Di Chai , Pengbo Li , Feiyuan Zhang , Yilun Jin , Han Tian , Kaiqiang Xu , Binhang Yuan , Dian Shen , Junxue Zhang , Kai Chen

LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model

Long context inference scenarios have become increasingly important for large language models, yet they introduce significant computational latency. While prior research has optimized long-sequence inference through operators, model…

Computation and Language · Computer Science 2025-11-10 Wei Shao , Lingchao Zheng , Pengyu Wang , Peizhen Zheng , Jun Li , Yuwei Fan

Token-wise Curriculum Learning for Neural Machine Translation

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where…

Computation and Language · Computer Science 2021-03-23 Chen Liang , Haoming Jiang , Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao , Tuo Zhao

How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks

We investigate the performance of large language models on repetitive deterministic prediction tasks and study how the sequence accuracy rate scales with output length. Each such task involves repeating the same operation n times. Examples…

Artificial Intelligence · Computer Science 2025-11-25 Wanda Hou , Leon Zhou , Hong-Ye Hu , Yubei Chen , Yi-Zhuang You , Xiao-Liang Qi

Efficient Long Context Fine-tuning with Chunk Flow

Long context fine-tuning of large language models(LLMs) involves training on datasets that are predominantly composed of short sequences and a small proportion of longer sequences. However, existing approaches overlook this long-tail…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-14 Xiulong Yuan , Hongtao Xu , Wenting Shen , Ang Wang , Xiafei Qiu , Jie Zhang , Yuqiong Liu , Bowen Yu , Junyang Lin , Mingzhen Li , Weile Jia , Yong Li , Wei Lin

Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

Training Long-Context Large Language Models (LLMs) is challenging, as hybrid training with long-context and short-context data often leads to workload imbalances. Existing works mainly use data packing to alleviate this issue, but fail to…

Machine Learning · Computer Science 2025-10-14 Yongqiang Yao , Jingru Tan , Kaihuan Liang , Feizhao Zhang , Jiahao Hu , Shuo Wu , Yazhe Niu , Ruihao Gong , Dahua Lin , Ningyi Xu

Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection

Recent advancements in large language models (LLMs) have significantly improved code generation and program comprehension, accelerating the evolution of software engineering. Current methods primarily enhance model performance by leveraging…

Computation and Language · Computer Science 2025-07-04 Weijie Lyu , Sheng-Jun Huang , Xuan Xia

Token-Efficient Leverage Learning in Large Language Models

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks…

Computation and Language · Computer Science 2024-04-02 Yuanhao Zeng , Min Wang , Yihang Wang , Yingxia Shao

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

Large language models (LLMs) achieve state-of-the-art accuracy on complex reasoning tasks by generating multiple chain-of-thought (CoT) traces, but using a fixed token budget per query leads to over-computation on easy inputs and…

Artificial Intelligence · Computer Science 2026-02-03 Katrina Brown , Aneesh Muppidi , Rana Shahout

Large Language Models Are Overparameterized Text Encoders

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that…

Computation and Language · Computer Science 2024-10-21 Thennal D K , Tim Fischer , Chris Biemann

Test-Time Training Done Right

Test-Time Training (TTT) models context dependencies by adapting part of the model's weights (referred to as fast weights) during inference. This fast weight, akin to recurrent states in RNNs, stores temporary memories of past tokens in the…

Machine Learning · Computer Science 2025-06-02 Tianyuan Zhang , Sai Bi , Yicong Hong , Kai Zhang , Fujun Luan , Songlin Yang , Kalyan Sunkavalli , William T. Freeman , Hao Tan

Extending Token Computation for LLM Reasoning

Large Language Models (LLMs) are pivotal in advancing natural language processing but often struggle with complex reasoning tasks due to inefficient attention distributions. In this paper, we explore the effect of increased computed tokens…

Computation and Language · Computer Science 2024-06-25 Bingli Liao , Danilo Vasconcellos Vargas

Sequential Modeling Enables Scalable Learning for Large Vision Models

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images…

Computer Vision and Pattern Recognition · Computer Science 2023-12-04 Yutong Bai , Xinyang Geng , Karttikeya Mangalam , Amir Bar , Alan Yuille , Trevor Darrell , Jitendra Malik , Alexei A Efros

Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training

Training large language models (LLMs) with increasingly long and varying sequence lengths introduces severe load imbalance challenges in large-scale data-parallel training. Recent frameworks attempt to mitigate these issues through data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Chang Chen , Tiancheng Chen , Jiangfei Duan , Qianchao Zhu , Zerui Wang , Qinghao Hu , Peng Sun , Xiuhong Li , Chao Yang , Torsten Hoefler

L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMs

Efficiently fine-tuning Large Language Models (LLMs) for specific tasks presents a considerable challenge in natural language processing. Traditional methods, like prompt or prefix tuning, typically rely on arbitrary tokens for training,…

Computation and Language · Computer Science 2024-04-16 Md. Kowsher , Md. Shohanur Islam Sobuj , Asif Mahmud , Nusrat Jahan Prottasha , Prakash Bhat