Related papers: Efficient Sequence Packing without Cross-contamina…

Learning Adaptive LLM Decoding

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai

Better & Faster Large Language Models via Multi-token Prediction

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More…

Computation and Language · Computer Science 2024-05-01 Fabian Gloeckle , Badr Youbi Idrissi , Baptiste Rozière , David Lopez-Paz , Gabriel Synnaeve

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the…

Computation and Language · Computer Science 2026-04-23 Zihao Xu , John Harvill , Ziwei Fan , Yizhou Sun , Hao Ding , Hao Wang

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random…

Computation and Language · Computer Science 2020-02-19 Jesse Dodge , Gabriel Ilharco , Roy Schwartz , Ali Farhadi , Hannaneh Hajishirzi , Noah Smith

How to Train Data-Efficient LLMs

The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data…

Machine Learning · Computer Science 2024-02-16 Noveen Sachdeva , Benjamin Coleman , Wang-Cheng Kang , Jianmo Ni , Lichan Hong , Ed H. Chi , James Caverlee , Julian McAuley , Derek Zhiyuan Cheng

Reducing Sequence Length by Predicting Edit Operations with Large Language Models

Large Language Models (LLMs) have demonstrated remarkable performance in various tasks and gained significant attention. LLMs are also used for local sequence transduction tasks, including grammatical error correction (GEC) and formality…

Computation and Language · Computer Science 2023-10-24 Masahiro Kaneko , Naoaki Okazaki

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory…

Computation and Language · Computer Science 2024-02-22 Zexue He , Leonid Karlinsky , Donghyun Kim , Julian McAuley , Dmitry Krotov , Rogerio Feris

Mitigating Label Length Bias in Large Language Models

Large language models (LLMs) are powerful zero- and few-shot learners. However, when predicting over a set of candidate options, LLMs suffer from label biases, and existing calibration methods overlook biases arising from multi-token class…

Computation and Language · Computer Science 2025-11-19 Mario Sanz-Guerrero , Katharina von der Wense

Lossless Token Sequence Compression via Meta-Tokens

Existing work on prompt compression for Large Language Models (LLM) focuses on lossy methods that try to maximize the retention of semantic information that is relevant to downstream tasks while significantly reducing the sequence length.…

Computation and Language · Computer Science 2025-08-22 John Harvill , Ziwei Fan , Hao Wang , Luke Huan , Anoop Deoras , Yizhou Sun , Hao Ding

Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals

Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient when the information required to answer a query is localized within the context. We present dynamic context cutoff, a novel method enabling…

Computation and Language · Computer Science 2026-02-10 Roy Xie , Junlin Wang , Paul Rosu , Chunyuan Deng , Bolun Sun , Zihao Lin , Bhuwan Dhingra

Language Model Cascades: Token-level uncertainty and beyond

Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality…

Computation and Language · Computer Science 2024-04-17 Neha Gupta , Harikrishna Narasimhan , Wittawat Jitkrittum , Ankit Singh Rawat , Aditya Krishna Menon , Sanjiv Kumar

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide…

Computation and Language · Computer Science 2023-02-01 Hao Liu , Xinyang Geng , Lisa Lee , Igor Mordatch , Sergey Levine , Sharan Narang , Pieter Abbeel

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a…

Machine Learning · Computer Science 2026-04-24 Jinghui Wang , Shaojie Wang , Yinghan Cui , Xuxing Chen , Chao Wang , Liang Huang , Can Tang , Xiaojiang Zhang , Junyi Peng , Li Wan , Haotian Zhang , Bin Chen

A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization

In the era of large language models (LLMs), fine-tuning pretrained models has become ubiquitous. Yet the theoretical underpinning remains an open question. A central question is why only a few epochs of fine-tuning are typically sufficient…

Machine Learning · Statistics 2026-02-17 Zexuan Sun , Garvesh Raskutti

Inference Acceleration for Large Language Models on CPUs

In recent years, large language models have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, deploying these models for real-world applications often requires efficient inference solutions…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-13 Ditto PS , Jithin VG , Adarsh MS

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a pretrained…

Computation and Language · Computer Science 2026-05-26 Amirhossein Yousefiramandi , Ciaran Cooney

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass.…

Computation and Language · Computer Science 2024-12-23 Peyman Hosseini , Ignacio Castro , Iacopo Ghinassi , Matthew Purver

ST-LLM: Large Language Models Are Effective Temporal Learners

Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation, prompting research efforts towards video LLMs to facilitate human-AI interaction at the video level. However, how to effectively…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Ruyang Liu , Chen Li , Haoran Tang , Yixiao Ge , Ying Shan , Ge Li

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies how tokenization impacts model performance by analyzing…

Computation and Language · Computer Science 2025-04-15 Buu Phan , Brandon Amos , Itai Gat , Marton Havasi , Matthew Muckley , Karen Ullrich

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of…

Machine Learning · Computer Science 2024-06-25 Cong Xu , Gayathri Saranathan , Mahammad Parwez Alam , Arpit Shah , James Lim , Soon Yee Wong , Foltin Martin , Suparna Bhattacharya