Related papers: Efficient Sequence Packing without Cross-contamina…

Improving the Robustness of Large Language Models via Consistency Alignment

Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses…

Computation and Language · Computer Science 2024-03-25 Yukun Zhao , Lingyong Yan , Weiwei Sun , Guoliang Xing , Shuaiqiang Wang , Chong Meng , Zhicong Cheng , Zhaochun Ren , Dawei Yin

Multi-Grained Patch Training for Efficient LLM-based Recommendation

Large Language Models (LLMs) have emerged as a new paradigm for recommendation by converting interacted item history into language modeling. However, constrained by the limited context length of LLMs, existing approaches have to truncate…

Information Retrieval · Computer Science 2025-05-20 Jiayi Liao , Ruobing Xie , Sihang Li , Xiang Wang , Xingwu Sun , Zhanhui Kang , Xiangnan He

LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models

The problem of data contamination is now almost inevitable during the development of large language models (LLMs), with the training data commonly integrating those evaluation benchmarks even unintentionally. This problem subsequently makes…

Computation and Language · Computer Science 2025-09-19 Ruijie Hou , Yueyang Jiao , Hanxu Hu , Yingming Li , Wai Lam , Huajian Zhang , Hongyuan Lu

ShaRP: SHAllow-LayeR Pruning for Efficient Video Large Language Models

Video Large Language Models (VLLMs) incur substantial prefilling cost due to the large number of visual tokens. While attention-based token pruning offers a promising acceleration strategy, applying it at shallow decoder layers often causes…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Yingjie Xia , Tao Liu , Jinglei Shi , Qingsong Xie , Heng Guo , Jian Yang , Xi Wang

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant…

Machine Learning · Computer Science 2024-09-05 Adam Ibrahim , Benjamin Thérien , Kshitij Gupta , Mats L. Richter , Quentin Anthony , Timothée Lesort , Eugene Belilovsky , Irina Rish

New Solutions on LLM Acceleration, Optimization, and Application

Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present…

Machine Learning · Computer Science 2024-06-18 Yingbing Huang , Lily Jiaxin Wan , Hanchen Ye , Manvi Jha , Jinghua Wang , Yuhong Li , Xiaofan Zhang , Deming Chen

Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT,…

Computation and Language · Computer Science 2022-03-25 Le Hou , Richard Yuanzhe Pang , Tianyi Zhou , Yuexin Wu , Xinying Song , Xiaodan Song , Denny Zhou

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

Large language models (LLMs) are increasingly used for long-content generation (e.g., long Chain-of-Thought reasoning) where decoding efficiency becomes a critical bottleneck: Autoregressive decoding is inherently limited by its sequential…

Computation and Language · Computer Science 2025-06-05 Zhepei Wei , Wei-Lin Chen , Xinyu Zhu , Yu Meng

Pretraining with Token-Level Adaptive Latent Chain-of-Thought

Scaling large language models by increasing parameters and training data is increasingly constrained by limited high-quality corpora and rising communication costs. This work explores an alternative axis: increasing per-token computation…

Computation and Language · Computer Science 2026-03-11 Boyi Zeng , Yiqin Hao , He Li , Shixiang Song , Feichen Song , Zitong Wang , Siyuan Huang , Yi Xu , ZiWei He , Xinbing Wang , Zhouhan Lin

Convolutional Sequence to Sequence Learning

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to…

Computation and Language · Computer Science 2017-07-26 Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , Yann N. Dauphin

Sparser, Faster, Lighter Transformer Language Models

Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the…

Machine Learning · Computer Science 2026-05-11 Edoardo Cetin , Stefano Peluchetti , Emilio Castillo , Akira Naruse , Mana Murakami , Llion Jones

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining

Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training…

Machine Learning · Computer Science 2025-05-27 Melis Ilayda Bal , Volkan Cevher , Michael Muehlebach

Thinking Augmented Pre-training

This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an…

Computation and Language · Computer Science 2025-10-20 Liang Wang , Nan Yang , Shaohan Huang , Li Dong , Furu Wei

Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream Tasks

Less than 1% of protein sequences are structurally and functionally annotated. Natural Language Processing (NLP) community has recently embraced self-supervised learning as a powerful approach to learn representations from unlabeled text,…

Biomolecules · Quantitative Biology 2020-12-08 Modestas Filipavicius , Matteo Manica , Joris Cadow , Maria Rodriguez Martinez

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Large language models have high compute, latency, and memory requirements. While specialized accelerators such as GPUs and TPUs typically run these workloads, CPUs are more widely available and consume less energy. Accelerating LLMs with…

Machine Learning · Computer Science 2025-02-19 Ahmed F. AbouElhamayed , Jordan Dotzel , Yash Akhauri , Chi-Chih Chang , Sameh Gobriel , J. Pablo Muñoz , Vui Seng Chua , Nilesh Jain , Mohamed S. Abdelfattah

freePruner: A Training-free Approach for Large Multimodal Model Acceleration

Large Multimodal Models (LMMs) have demonstrated impressive capabilities in visual-language tasks but face significant deployment challenges due to their high computational demands. While recent token reduction methods show promise for…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Bingxin Xu , Yuzhang Shang , Yunhao Ge , Qian Lou , Yan Yan

Crafting Efficient Fine-Tuning Strategies for Large Language Models

This paper addresses the challenges of efficiently fine-tuning large language models (LLMs) by exploring data efficiency and hyperparameter optimization. We investigate the minimum data required for effective fine-tuning and propose a novel…

Computation and Language · Computer Science 2024-07-22 Michael Oliver , Guan Wang

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-14 Jinghan Yao , Sam Ade Jacobs , Masahiro Tanaka , Olatunji Ruwase , Hari Subramoni , Dhabaleswar K. Panda

An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling

Sequence labeling models often benefit from incorporating external knowledge. However, this practice introduces data heterogeneity and complicates the model with additional modules, leading to increased expenses for training a…

Computation and Language · Computer Science 2025-06-19 Xuemei Tang , Jun Wang , Qi Su , Chu-ren Huang , Jinghang Gu