English
Related papers

Related papers: Efficient Sequence Packing without Cross-contamina…

200 papers

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More…

Computation and Language · Computer Science 2024-05-01 Fabian Gloeckle , Badr Youbi Idrissi , Baptiste Rozière , David Lopez-Paz , Gabriel Synnaeve

Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the…

Computation and Language · Computer Science 2026-04-23 Zihao Xu , John Harvill , Ziwei Fan , Yizhou Sun , Hao Ding , Hao Wang

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random…

Computation and Language · Computer Science 2020-02-19 Jesse Dodge , Gabriel Ilharco , Roy Schwartz , Ali Farhadi , Hannaneh Hajishirzi , Noah Smith

The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data…

Large Language Models (LLMs) have demonstrated remarkable performance in various tasks and gained significant attention. LLMs are also used for local sequence transduction tasks, including grammatical error correction (GEC) and formality…

Computation and Language · Computer Science 2023-10-24 Masahiro Kaneko , Naoaki Okazaki

Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory…

Computation and Language · Computer Science 2024-02-22 Zexue He , Leonid Karlinsky , Donghyun Kim , Julian McAuley , Dmitry Krotov , Rogerio Feris

Large language models (LLMs) are powerful zero- and few-shot learners. However, when predicting over a set of candidate options, LLMs suffer from label biases, and existing calibration methods overlook biases arising from multi-token class…

Computation and Language · Computer Science 2025-11-19 Mario Sanz-Guerrero , Katharina von der Wense

Existing work on prompt compression for Large Language Models (LLM) focuses on lossy methods that try to maximize the retention of semantic information that is relevant to downstream tasks while significantly reducing the sequence length.…

Computation and Language · Computer Science 2025-08-22 John Harvill , Ziwei Fan , Hao Wang , Luke Huan , Anoop Deoras , Yizhou Sun , Hao Ding

Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient when the information required to answer a query is localized within the context. We present dynamic context cutoff, a novel method enabling…

Computation and Language · Computer Science 2026-02-10 Roy Xie , Junlin Wang , Paul Rosu , Chunyuan Deng , Bolun Sun , Zihao Lin , Bhuwan Dhingra

Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality…

Computation and Language · Computer Science 2024-04-17 Neha Gupta , Harikrishna Narasimhan , Wittawat Jitkrittum , Ankit Singh Rawat , Aditya Krishna Menon , Sanjiv Kumar

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide…

Computation and Language · Computer Science 2023-02-01 Hao Liu , Xinyang Geng , Lisa Lee , Igor Mordatch , Sergey Levine , Sharan Narang , Pieter Abbeel

Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a…

In the era of large language models (LLMs), fine-tuning pretrained models has become ubiquitous. Yet the theoretical underpinning remains an open question. A central question is why only a few epochs of fine-tuning are typically sufficient…

Machine Learning · Statistics 2026-02-17 Zexuan Sun , Garvesh Raskutti

In recent years, large language models have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, deploying these models for real-world applications often requires efficient inference solutions…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-13 Ditto PS , Jithin VG , Adarsh MS

We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a pretrained…

Computation and Language · Computer Science 2026-05-26 Amirhossein Yousefiramandi , Ciaran Cooney

Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass.…

Computation and Language · Computer Science 2024-12-23 Peyman Hosseini , Ignacio Castro , Iacopo Ghinassi , Matthew Purver

Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation, prompting research efforts towards video LLMs to facilitate human-AI interaction at the video level. However, how to effectively…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Ruyang Liu , Chen Li , Haoran Tang , Yixiao Ge , Ying Shan , Ge Li

Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies how tokenization impacts model performance by analyzing…

Computation and Language · Computer Science 2025-04-15 Buu Phan , Brandon Amos , Itai Gat , Marton Havasi , Matthew Muckley , Karen Ullrich

Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of…