Related papers: Efficient Sequence Packing without Cross-contamina…

Towards Efficient and Effective Alignment of Large Language Models

Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge. This thesis advances LLM alignment by introducing novel…

Computation and Language · Computer Science 2025-06-12 Yuxin Jiang

FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing

Recently, large language models (LLMs) have demonstrated superior performance across various tasks by adhering to scaling laws, which significantly increase model size. However, the huge computation overhead during inference hinders the…

Computation and Language · Computer Science 2024-12-17 Zekai Li , Jintu Zheng , Ji Liu , Han Liu , Haowei Zhu , Zeping Li , Fuwei Yang , Haiduo Huang , Jinzhang Peng , Dong Li , Lu Tian , Emad Barsoum

Order Independence With Finetuning

Large language models (LLMs) demonstrate remarkable performance on many NLP tasks, yet often exhibit order dependence: simply reordering semantically identical tokens (e.g., answer choices in multiple-choice questions) can lead to…

Computation and Language · Computer Science 2025-04-01 Katrina Brown , Reid McIlroy

Learning Dynamic Feature Selection for Fast Sequential Prediction

We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning…

Computation and Language · Computer Science 2015-05-25 Emma Strubell , Luke Vilnis , Kate Silverstein , Andrew McCallum

Domain Adaptation of LLMs for Process Data

In recent years, Large Language Models (LLMs) have emerged as a prominent area of interest across various research domains, including Process Mining (PM). Current applications in PM have predominantly centered on prompt engineering…

Computation and Language · Computer Science 2025-09-04 Rafael Seidi Oyamada , Jari Peeperkorn , Jochen De Weerdt , Johannes De Smedt

Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning

Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack…

Computation and Language · Computer Science 2024-10-14 Nusrat Jahan Prottasha , Asif Mahmud , Md. Shohanur Islam Sobuj , Prakash Bhat , Md Kowsher , Niloofar Yousefi , Ozlem Ozmen Garibay

Accelerating Production LLMs with Combined Token/Embedding Speculators

This technical report describes the design and training of novel speculative decoding draft models, for accelerating the inference speeds of large language models in a production environment. By conditioning draft predictions on both…

Computation and Language · Computer Science 2024-06-10 Davis Wertheimer , Joshua Rosenkranz , Thomas Parnell , Sahil Suneja , Pavithra Ranganathan , Raghu Ganti , Mudhakar Srivatsa

BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens

Existing methods for training LLMs on long-sequence data, such as Tensor Parallelism and Context Parallelism, exhibit low Model FLOPs Utilization as sequence lengths and number of GPUs increase, especially when sequence lengths exceed 1M…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-25 Ao Sun , Weilin Zhao , Xu Han , Cheng Yang , Zhiyuan Liu , Chuan Shi , Maosong sun

Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning

As the demand for processing extended textual data grows, the ability to handle long-range dependencies and maintain computational efficiency is more critical than ever. One of the key issues for long-sequence modeling using attention-based…

Computation and Language · Computer Science 2025-05-26 Aosong Feng , Rex Ying , Leandros Tassiulas

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena…

Computation and Language · Computer Science 2025-08-29 Miguel Moura Ramos , Patrick Fernandes , Sweta Agrawal , André F. T. Martins

Training Compute-Optimal Protein Language Models

We explore optimally training protein language models, an area of significant interest in biological research where guidance on best practices is limited. Most models are trained with extensive compute resources until performance gains…

Machine Learning · Computer Science 2024-11-05 Xingyi Cheng , Bo Chen , Pan Li , Jing Gong , Jie Tang , Le Song

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications. To fully leverage the nearly unlimited corpora and capture…

Computation and Language · Computer Science 2018-09-11 Liyuan Liu , Xiang Ren , Jingbo Shang , Jian Peng , Jiawei Han

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

Recently, Multi-modal Large Language Models (MLLMs) have shown remarkable effectiveness for multi-modal tasks due to their abilities to generate and understand cross-modal data. However, processing long sequences of visual tokens extracted…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Haicheng Wang , Zhemeng Yu , Gabriele Spadaro , Chen Ju , Victor Quétu , Shuai Xiao , Enzo Tartaglione

PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference

Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models such as encoder-only transformers and…

Hardware Architecture · Computer Science 2025-05-06 Yufeng Gu , Alireza Khadem , Sumanth Umesh , Ning Liang , Xavier Servot , Onur Mutlu , Ravi Iyer , Reetuparna Das

It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers

While encoder-only models such as BERT and ModernBERT are ubiquitous in real-world NLP applications, their conventional reliance on task-specific classification heads can limit their applicability compared to decoder-based large language…

Computation and Language · Computer Science 2025-02-11 Benjamin Clavié , Nathan Cooper , Benjamin Warner

Attention over pre-trained Sentence Embeddings for Long Document Classification

Despite being the current de-facto models in most NLP tasks, transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens. Several attempts to address this issue were studied, either…

Computation and Language · Computer Science 2023-07-19 Amine Abdaoui , Sourav Dutta

Parameter-Efficient Transfer Learning for NLP

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we…

Machine Learning · Computer Science 2019-06-14 Neil Houlsby , Andrei Giurgiu , Stanislaw Jastrzebski , Bruna Morrone , Quentin de Laroussilhe , Andrea Gesmundo , Mona Attariyan , Sylvain Gelly

Analysing The Impact of Sequence Composition on Language Model Pre-Training

Most language model pre-training frameworks concatenate multiple documents into fixed-length sequences and use causal masking to compute the likelihood of each token given its context; this strategy is widely adopted due to its simplicity…

Computation and Language · Computer Science 2025-02-14 Yu Zhao , Yuanbin Qu , Konrad Staniszewski , Szymon Tworkowski , Wei Liu , Piotr Miłoś , Yuxiang Wu , Pasquale Minervini

Efficient Prompt Caching via Embedding Similarity

Large language models (LLMs) have achieved huge success in numerous natural language process (NLP) tasks. However, it faces the challenge of significant resource consumption during inference. In this paper, we aim to improve the inference…

Computation and Language · Computer Science 2024-02-05 Hanlin Zhu , Banghua Zhu , Jiantao Jiao

Learning to Skip for Language Modeling

Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning. However, most language models allocate the same amount of parameters or computation to each token, disregarding the…

Computation and Language · Computer Science 2023-11-28 Dewen Zeng , Nan Du , Tao Wang , Yuanzhong Xu , Tao Lei , Zhifeng Chen , Claire Cui