English
Related papers

Related papers: Long-Context Language Modeling with Parallel Conte…

200 papers

Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k…

Computation and Language · Computer Science 2024-11-08 Dawei Zhu , Liang Wang , Nan Yang , Yifan Song , Wenhao Wu , Furu Wei , Sujian Li

Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding…

Computation and Language · Computer Science 2024-03-26 Guanzheng Chen , Xin Li , Zaiqiao Meng , Shangsong Liang , Lidong Bing

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context…

Computation and Language · Computer Science 2024-01-17 Yikai Zhang , Junlong Li , Pengfei Liu

When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf…

Computation and Language · Computer Science 2023-08-02 Nir Ratner , Yoav Levine , Yonatan Belinkov , Ori Ram , Inbal Magar , Omri Abend , Ehud Karpas , Amnon Shashua , Kevin Leyton-Brown , Yoav Shoham

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support…

Computation and Language · Computer Science 2024-02-23 Jiaheng Liu , Zhiqi Bai , Yuanxing Zhang , Chenchen Zhang , Yu Zhang , Ge Zhang , Jiakai Wang , Haoran Que , Yukang Chen , Wenbo Su , Tiezheng Ge , Jie Fu , Wenhu Chen , Bo Zheng

LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length, while preserving the performance on the original shorter context window. This is achieved by three…

Computation and Language · Computer Science 2025-02-28 Ning Shang , Li Lyna Zhang , Siyuan Wang , Gaokai Zhang , Gilsinia Lopez , Fan Yang , Weizhu Chen , Mao Yang

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are…

Computation and Language · Computer Science 2024-02-22 Yiran Ding , Li Lyna Zhang , Chengruidong Zhang , Yuanyuan Xu , Ning Shang , Jiahang Xu , Fan Yang , Mao Yang

Large Language Models (LLMs) are trained with a pre-defined context length, restricting their use in scenarios requiring long inputs. Previous efforts for adapting LLMs to a longer length usually requires fine-tuning with this target length…

Computation and Language · Computer Science 2024-02-22 Dawei Zhu , Nan Yang , Liang Wang , Yifan Song , Wenhao Wu , Furu Wei , Sujian Li

Large Language Models (LLMs) often struggle to process and generate coherent context when the number of input tokens exceeds the pre-trained length. Recent advancements in long-context extension have significantly expanded the context…

Computation and Language · Computer Science 2025-04-29 Yi Lu , Wanxu Zhao , Xin Zhou , Chenxin An , Chenglong Wang , Shuo Li , Yuming Yang , Jun Zhao , Tao Ji , Tao Gui , Qi Zhang , Xuanjing Huang

Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies…

Computation and Language · Computer Science 2025-08-06 Sikui Zhang , Guangze Gao , Ziyun Gan , Chunfeng Yuan , Zefeng Lin , Houwen Peng , Bing Li , Weiming Hu

Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we propose Extensible…

Computation and Language · Computer Science 2024-02-20 Ninglu Shao , Shitao Xiao , Zheng Liu , Peitian Zhang

Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound…

Computation and Language · Computer Science 2024-10-04 Yingsheng Wu , Yuxuan Gu , Xiaocheng Feng , Weihong Zhong , Dongliang Xu , Qing Yang , Hongtao Liu , Bing Qin

Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences.…

Computation and Language · Computer Science 2024-09-05 Zhiyuan Hu , Yuliang Liu , Jinman Zhao , Suyuchen Wang , Yan Wang , Wei Shen , Qing Gu , Anh Tuan Luu , See-Kiong Ng , Zhiwei Jiang , Bryan Hooi

Recently, Large language models (LLMs) have revolutionized Natural Language Processing (NLP). Pretrained LLMs, due to limited training context size, struggle with handling long token sequences, limiting their performance on various…

Computation and Language · Computer Science 2024-12-11 Haoran Lian , Junmin Chen , Wei Huang , Yizhe Xiong , Wenping Hu , Guiguang Ding , Hui Chen , Jianwei Niu , Zijia Lin , Fuzheng Zhang , Di Zhang

The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request. Restricted by static parallelism…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-30 Bingyang Wu , Shengyu Liu , Yinmin Zhong , Peng Sun , Xuanzhe Liu , Xin Jin

Rotary Position Embedding (RoPE) is an efficient position encoding approach and is widely utilized in numerous large language models (LLMs). Recently, a lot of methods have been put forward to further expand the context window based on…

Computation and Language · Computer Science 2025-05-20 Wenqiao Zhu , Chao Xu , Lulu Wang , Jun Wu

Extrapolating ultra-long contexts (text length >128K) remains a major challenge for large language models (LLMs), as most training-free extrapolation methods are not only severely limited by memory bottlenecks, but also suffer from the…

Computation and Language · Computer Science 2025-06-10 Jing Xiong , Jianghan Shen , Chuanyang Zheng , Zhongwei Wan , Chenyang Zhao , Chiwun Yang , Fanghua Ye , Hongxia Yang , Lingpeng Kong , Ngai Wong

Processing long contexts is increasingly important for Large Language Models (LLMs) in tasks like multi-turn dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context…

Computation and Language · Computer Science 2026-04-15 Zihan Liao , Jun Wang , Hang Yu , Lingxiao Wei , Jianguo Li , Jun Wang , Wei Zhang

Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we proposeExtensible…

Computation and Language · Computer Science 2024-02-20 Kun Luo , Zheng Liu , Shitao Xiao , Kang Liu

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei
‹ Prev 1 2 3 10 Next ›