English
Related papers

Related papers: Evaluating Long Range Dependency Handling in Code …

200 papers

As the context limits of Large Language Models (LLMs) increase, the range of possible applications and downstream functions broadens. In many real-world tasks, decisions depend on details scattered across collections of often disparate…

Computation and Language · Computer Science 2025-04-24 Jonathan Roberts , Kai Han , Samuel Albanie

Large language models (LLMs) are equipped with increasingly extended context windows recently, yet their long context understanding capabilities over long dependency tasks remain fundamentally limited and underexplored. This gap is…

Computation and Language · Computer Science 2025-10-28 Ziyuan He , Yuxuan Wang , Jiaqi Li , Kexin Liang , Muhan Zhang

The proliferation of Large Language Models (LLMs) highlights the critical importance of conducting thorough evaluations to discern their comparative advantages, limitations, and optimal use cases. Particularly important is assessing their…

Computation and Language · Computer Science 2024-04-16 Daniel Machlab , Rick Battle

Multiple recent studies have documented large language models' (LLMs) performance on calling external tools/functions. Others focused on LLMs' abilities to handle longer context lengths. At the intersection of these areas lies another…

While recent large language models (LLMs) demonstrate remarkable abilities in responding to queries in diverse languages, their ability to handle long multilingual contexts is unexplored. As such, a systematic evaluation of the long-context…

Computation and Language · Computer Science 2024-08-20 Amey Hengle , Prasoon Bajpai , Soham Dan , Tanmoy Chakraborty

Solving complex or long-horizon problems often requires large language models (LLMs) to use external tools and operate over a significantly longer context window. New LLMs enable longer context windows and support tool calling capabilities.…

Machine Learning · Computer Science 2025-12-03 Tsimur Hadeliya , Mohammad Ali Jauhar , Nidhi Sakpal , Diogo Cruz

Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a…

Software Engineering · Computer Science 2024-06-11 Jiawei Liu , Jia Le Tian , Vijay Daita , Yuxiang Wei , Yifeng Ding , Yuhan Katherine Wang , Jun Yang , Lingming Zhang

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to…

Computation and Language · Computer Science 2024-04-09 Xuanfan Ni , Hengyi Cai , Xiaochi Wei , Shuaiqiang Wang , Dawei Yin , Piji Li

Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass.…

Computation and Language · Computer Science 2024-12-23 Peyman Hosseini , Ignacio Castro , Iacopo Ghinassi , Matthew Purver

Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs'…

Computation and Language · Computer Science 2024-09-09 Jiaqi Li , Mengmeng Wang , Zilong Zheng , Muhan Zhang

Existing benchmarks for evaluating long-context language models (LCLMs) primarily focus on long-context recall, requiring models to produce short responses based on a few critical snippets while processing thousands of irrelevant tokens. We…

Computation and Language · Computer Science 2025-09-30 Xi Ye , Fangcong Yin , Yinghui He , Joie Zhang , Howard Yen , Tianyu Gao , Greg Durrett , Danqi Chen

Long-context modeling capabilities are important for large language models (LLMs) in various applications. However, directly training LLMs with long context windows is insufficient to enhance this capability since some training samples do…

Computation and Language · Computer Science 2024-05-29 Longze Chen , Ziqiang Liu , Wanwei He , Yunshui Li , Run Luo , Min Yang

Existing multilingual long-context benchmarks, often based on the popular needle-in-a-haystack test, primarily evaluate a model's ability to locate specific information buried within irrelevant texts. However, such a retrieval-centric…

Computation and Language · Computer Science 2025-04-18 Amey Hengle , Prasoon Bajpai , Soham Dan , Tanmoy Chakraborty

Long-context large language models (LC LLMs) promise to increase reliability of LLMs in real-world tasks requiring processing and understanding of long input documents. However, this ability of LC LLMs to reliably utilize their growing…

Computation and Language · Computer Science 2024-12-23 Lavanya Gupta , Saket Sharma , Yiyun Zhao

Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context…

Machine Learning · Computer Science 2024-11-07 Quinn Leng , Jacob Portes , Sam Havens , Matei Zaharia , Michael Carbin

Current long-context benchmarks primarily focus on retrieval-based tests, requiring Large Language Models (LLMs) to locate specific information within extensive input contexts, such as the needle-in-a-haystack (NIAH) benchmark. Long-context…

Computation and Language · Computer Science 2024-10-25 Xiang Liu , Peijie Dong , Xuming Hu , Xiaowen Chu

A common practice in large language model (LLM) usage for complex analytical tasks such as code generation, is to sample a solution for the entire task within the model's context window. Previous works have shown that subtask decomposition…

Artificial Intelligence · Computer Science 2025-02-03 Yotam Wolf , Binyamin Rothberg , Dorin Shteyman , Amnon Shashua

Large Language Models (LLMs) have demonstrated remarkable capabilities in handling long texts and have almost perfect performance in traditional retrieval tasks. However, their performance significantly degrades when it comes to numerical…

Computation and Language · Computer Science 2024-12-05 Yijiong Yu

Large language models (LLMs) increasingly assist software engineering tasks that require reasoning over long code contexts, yet their robustness under varying input conditions remains unclear. We conduct a systematic study of long-context…

Software Engineering · Computer Science 2026-02-20 Kishan Maharaj , Nandakishore Menon , Ashita Saxena , Srikanth Tamilselvam

This study investigates the reasoning robustness of large language models (LLMs) on mathematical problem-solving tasks under systematically introduced input perturbations. Using the GSM8K dataset as a controlled testbed, we evaluate how…

Artificial Intelligence · Computer Science 2025-04-04 Giannis Chatziveroglou , Richard Yun , Maura Kelleher
‹ Prev 1 2 3 10 Next ›