Related papers: Efficient Sequence Packing without Cross-contamina…

Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation

Large Language Models (LLMs) struggle with long-context reasoning, not only due to the quadratic scaling of computational complexity with sequence length but also because of the scarcity and expense of annotating long-context data. There…

Computation and Language · Computer Science 2025-04-18 Linda He , Jue Wang , Maurice Weber , Shang Zhu , Ben Athiwaratkun , Ce Zhang

Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs

Extracting sentence embeddings from large language models (LLMs) is a promising direction, as LLMs have demonstrated stronger semantic understanding capabilities. Previous studies typically focus on prompt engineering to elicit sentence…

Computation and Language · Computer Science 2025-07-04 Yuchen Fu , Zifeng Cheng , Zhiwei Jiang , Zhonghui Wang , Yafeng Yin , Zhengliang Li , Qing Gu

Towards Modeling Learner Performance with Large Language Models

Recent work exploring the capabilities of pre-trained large language models (LLMs) has demonstrated their ability to act as general pattern machines by completing complex token sequences representing a wide array of tasks, including…

Computers and Society · Computer Science 2024-03-25 Seyed Parsa Neshaei , Richard Lee Davis , Adam Hazimeh , Bojan Lazarevski , Pierre Dillenbourg , Tanja Käser

Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Recent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Hui-Syuan Yeh , Vera Demberg

Reducing Peak Memory Usage for Modern Multimodal Large Language Model Pipelines

Multimodal large language models (MLLMs) have recently demonstrated strong capabilities in understanding and generating responses from diverse visual inputs, including high-resolution images and long video sequences. As these models scale…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Junwan Kim , Hyunkyung Bae

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and…

Computation and Language · Computer Science 2024-05-30 Xindi Wang , Mahsa Salmani , Parsa Omidi , Xiangyu Ren , Mehdi Rezagholizadeh , Armaghan Eshaghi

MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression

Large language models have drastically changed the prospects of AI by introducing technologies for more complex natural language processing. However, current methodologies to train such LLMs require extensive resources including but not…

Computation and Language · Computer Science 2026-04-27 Noel Elias , Homa Esfahanizadeh , Kaan Kale , Sriram Vishwanath , Muriel Medard

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and…

Performance · Computer Science 2023-12-04 Longteng Zhang , Xiang Liu , Zeyu Li , Xinglin Pan , Peijie Dong , Ruibo Fan , Rui Guo , Xin Wang , Qiong Luo , Shaohuai Shi , Xiaowen Chu

Efficient Pre-Training with Token Superposition

Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training…

Computation and Language · Computer Science 2026-05-20 Bowen Peng , Théo Gigant , Jeffrey Quesnelle

Wavelet GPT: Wavelet Inspired Large Language Models

Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. We live in a world where most of the data around us, e.g., text, audio, and music, has a…

Signal Processing · Electrical Eng. & Systems 2025-02-11 Prateek Verma

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models,…

Machine Learning · Computer Science 2024-10-16 Keivan Alizadeh , Iman Mirzadeh , Hooman Shahrokhi , Dmitry Belenko , Frank Sun , Minsik Cho , Mohammad Hossein Sekhavat , Moin Nabi , Mehrdad Farajtabar

Comparison Study Between Token Classification and Sequence Classification In Text Classification

Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can…

Computation and Language · Computer Science 2022-11-28 Amir Jafari

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words…

Computation and Language · Computer Science 2025-09-03 DiJia Su , Hanlin Zhu , Yingchen Xu , Jiantao Jiao , Yuandong Tian , Qinqing Zheng

Distribution-Aware Companding Quantization of Large Language Models

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More…

Computation and Language · Computer Science 2026-03-03 Athul Radhakrishnan , Siddhant Mohan , Mahima Sachdeva

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence further limits reliability. Uncertainty…

Computation and Language · Computer Science 2026-03-23 Qi Cao , Andrew Gambardella , Takeshi Kojima , Yutaka Matsuo , Yusuke Iwasawa

Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing

While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token…

Computation and Language · Computer Science 2019-07-26 Chunyang Xiao , Christoph Teichmann , Konstantine Arkoudas

M-Ped: Multi-Prompt Ensemble Decoding for Large Language Models

With the widespread application of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), enhancing their performance has become a research hotspot. This paper presents a novel multi-prompt ensemble decoding…

Computation and Language · Computer Science 2024-12-25 Jiaxin Guo , Daimeng Wei , Yuanchang Luo , Shimin Tao , Hengchao Shang , Zongyao Li , Shaojun Li , Jinlong Yang , Zhanglin Wu , Zhiqiang Rao , Hao Yang

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft…

Computation and Language · Computer Science 2026-05-27 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning

Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has…

Machine Learning · Computer Science 2024-11-07 Shuhe Wang , Guoyin Wang , Yizhong Wang , Jiwei Li , Eduard Hovy , Chen Guo

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts…

Computation and Language · Computer Science 2026-03-23 Weiyao Luo , Suncong Zheng , Heming Xia , Weikang Wang , Yan Lei , Tianyu Liu , Shuang Chen , Zhifang Sui