English
Related papers

Related papers: LOGO -- Long cOntext aliGnment via efficient prefe…

200 papers

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a…

Computation and Language · Computer Science 2024-10-18 Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joseph E. Gonzalez , Raluca Ada Popa

Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This…

Computation and Language · Computer Science 2025-03-04 Guanzheng Chen , Xin Li , Michael Qizhe Shieh , Lidong Bing

Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by…

Computation and Language · Computer Science 2025-10-14 Huashan Sun , Shengyi Liao , Yansen Han , Yu Bai , Yang Gao , Cheng Fu , Weizhou Shen , Fanqi Wan , Ming Yan , Ji Zhang , Fei Huang

Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads…

Machine Learning · Computer Science 2025-01-16 Pinxue Zhao , Hailin Zhang , Fangcheng Fu , Xiaonan Nie , Qibin Liu , Fang Yang , Yuanbo Peng , Dian Jiao , Shuaipeng Li , Jinbao Xue , Yangyu Tao , Bin Cui

Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by…

Computation and Language · Computer Science 2024-12-06 Suyu Ge , Xihui Lin , Yunan Zhang , Jiawei Han , Hao Peng

While long-context large language models (LLMs) exhibit remarkable document processing capabilities, their prohibitively high training costs often hinder customized applications. To mitigate this issue, we propose \textit{Sequential…

Machine Learning · Computer Science 2025-05-23 Wenhao Li , Yuxin Zhang , Gen Luo , Daohai Yu , Rongrong Ji

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting…

Computation and Language · Computer Science 2024-06-07 Kai Lv , Yuqing Yang , Tengxiao Liu , Qinghui Gao , Qipeng Guo , Xipeng Qiu

Large Multimodal Models (LMMs) have demonstrated impressive performance in short video understanding tasks but face great challenges when applied to long video understanding. In contrast, Large Language Models (LLMs) exhibit outstanding…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Hongchen Wei , Zhenzhong Chen

Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass.…

Computation and Language · Computer Science 2024-12-23 Peyman Hosseini , Ignacio Castro , Iacopo Ghinassi , Matthew Purver

In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key…

Computation and Language · Computer Science 2024-08-13 Huiqiang Jiang , Qianhui Wu , Xufang Luo , Dongsheng Li , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Training Large Language Models (LLMs) on long contexts is severely constrained by prohibitive GPU memory overhead, not training time. The primary culprits are the activations, whose memory footprints scale linearly with sequence length. We…

Computation and Language · Computer Science 2026-03-03 Wenhao Li , Daohai Yu , Gen Luo , Yuxin Zhang , Fei Chao , Rongrong Ji , Yifan Wu , Jiaxin Liu , Ziyang Gong , Zimu Liao

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention…

Computation and Language · Computer Science 2024-09-02 Weijie Liu , Zecheng Tang , Juntao Li , Kehai Chen , Min Zhang

Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling, all of which require models to process and reason over long sequences of…

Computation and Language · Computer Science 2025-04-09 Chejian Xu , Wei Ping , Peng Xu , Zihan Liu , Boxin Wang , Mohammad Shoeybi , Bo Li , Bryan Catanzaro

Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented…

Artificial Intelligence · Computer Science 2025-08-26 Dulhan Jayalath , James Bradley Wendt , Nicholas Monath , Sandeep Tata , Beliz Gunel

Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations,…

Optimization and Control · Mathematics 2024-03-06 Zeyuan Ma , Hongshu Guo , Jiacheng Chen , Guojun Peng , Zhiguang Cao , Yining Ma , Yue-Jiao Gong

Large language models (LLMs) have achieved remarkable success, yet aligning their generations with human preferences remains a critical challenge. Existing approaches to preference modeling often rely on an explicit or implicit reward…

Computation and Language · Computer Science 2025-05-09 Zhuocheng Gong , Jian Guan , Wei Wu , Huishuai Zhang , Dongyan Zhao

We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of…

Computation and Language · Computer Science 2023-11-10 Jianxin Yang

We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts…

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances…

Computation and Language · Computer Science 2024-05-01 Peitian Zhang , Ninglu Shao , Zheng Liu , Shitao Xiao , Hongjin Qian , Qiwei Ye , Zhicheng Dou

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei
‹ Prev 1 2 3 10 Next ›