Related papers: LOGO -- Long cOntext aliGnment via efficient prefe…

LLoCO: Learning Long Contexts Offline

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a…

Computation and Language · Computer Science 2024-10-18 Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joseph E. Gonzalez , Raluca Ada Popa

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This…

Computation and Language · Computer Science 2025-03-04 Guanzheng Chen , Xin Li , Michael Qizhe Shieh , Lidong Bing

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by…

Computation and Language · Computer Science 2025-10-14 Huashan Sun , Shengyi Liao , Yansen Han , Yu Bai , Yang Gao , Cheng Fu , Weizhou Shen , Fanqi Wan , Ming Yan , Ji Zhang , Fei Huang

MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training

Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads…

Machine Learning · Computer Science 2025-01-16 Pinxue Zhao , Hailin Zhang , Fangcheng Fu , Xiaonan Nie , Qibin Liu , Fang Yang , Yuanbo Peng , Dian Jiao , Shuaipeng Li , Jinbao Xue , Yangyu Tao , Bin Cui

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by…

Computation and Language · Computer Science 2024-12-06 Suyu Ge , Xihui Lin , Yunan Zhang , Jiawei Han , Hao Peng

Training Long-Context LLMs Efficiently via Chunk-wise Optimization

While long-context large language models (LLMs) exhibit remarkable document processing capabilities, their prohibitively high training costs often hinder customized applications. To mitigate this issue, we propose \textit{Sequential…

Machine Learning · Computer Science 2025-05-23 Wenhao Li , Yuxin Zhang , Gen Luo , Daohai Yu , Rongrong Ji

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting…

Computation and Language · Computer Science 2024-06-07 Kai Lv , Yuqing Yang , Tengxiao Liu , Qinghui Gao , Qipeng Guo , Xipeng Qiu

Visual Context Window Extension: A New Perspective for Long Video Understanding

Large Multimodal Models (LMMs) have demonstrated impressive performance in short video understanding tasks but face great challenges when applied to long video understanding. In contrast, Large Language Models (LLMs) exhibit outstanding…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Hongchen Wei , Zhenzhong Chen

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass.…

Computation and Language · Computer Science 2024-12-23 Peyman Hosseini , Ignacio Castro , Iacopo Ghinassi , Matthew Purver

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key…

Computation and Language · Computer Science 2024-08-13 Huiqiang Jiang , Qianhui Wu , Xufang Luo , Dongsheng Li , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts

Training Large Language Models (LLMs) on long contexts is severely constrained by prohibitive GPU memory overhead, not training time. The primary culprits are the activations, whose memory footprints scale linearly with sequence length. We…

Computation and Language · Computer Science 2026-03-03 Wenhao Li , Daohai Yu , Gen Luo , Yuxin Zhang , Fei Chao , Rongrong Ji , Yifan Wu , Jiaxin Liu , Ziyang Gong , Zimu Liao

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention…

Computation and Language · Computer Science 2024-09-02 Weijie Liu , Zecheng Tang , Juntao Li , Kehai Chen , Min Zhang

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models

Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling, all of which require models to process and reason over long sequences of…

Computation and Language · Computer Science 2025-04-09 Chejian Xu , Wei Ping , Peng Xu , Zihan Liu , Boxin Wang , Mohammad Shoeybi , Bo Li , Bryan Catanzaro

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented…

Artificial Intelligence · Computer Science 2025-08-26 Dulhan Jayalath , James Bradley Wendt , Nicholas Monath , Sandeep Tata , Beliz Gunel

LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations,…

Optimization and Control · Mathematics 2024-03-06 Zeyuan Ma , Hongshu Guo , Jiacheng Chen , Guojun Peng , Zhiguang Cao , Yining Ma , Yue-Jiao Gong

Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes

Large language models (LLMs) have achieved remarkable success, yet aligning their generations with human preferences remains a critical challenge. Existing approaches to preference modeling often rely on an explicit or implicit reward…

Computation and Language · Computer Science 2025-05-09 Zhuocheng Gong , Jian Guan , Wei Wu , Huishuai Zhang , Dongyan Zhao

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of…

Computation and Language · Computer Science 2023-11-10 Jianxin Yang

Effective Long-Context Scaling of Foundation Models

We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts…

Computation and Language · Computer Science 2023-11-15 Wenhan Xiong , Jingyu Liu , Igor Molybog , Hejia Zhang , Prajjwal Bhargava , Rui Hou , Louis Martin , Rashi Rungta , Karthik Abinav Sankararaman , Barlas Oguz , Madian Khabsa , Han Fang , Yashar Mehdad , Sharan Narang , Kshitiz Malik , Angela Fan , Shruti Bhosale , Sergey Edunov , Mike Lewis , Sinong Wang , Hao Ma

Extending Llama-3's Context Ten-Fold Overnight

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances…

Computation and Language · Computer Science 2024-05-01 Peitian Zhang , Ninglu Shao , Zheng Liu , Shitao Xiao , Hongjin Qian , Qiwei Ye , Zhicheng Dou

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei