Related papers: CL4SE: Benchmarking Context Learning on Software E…

Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice

Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-based code review face three major limitations. Lack…

Software Engineering · Computer Science 2026-01-01 Ruida Hu , Xinchen Wang , Xin-Cheng Wen , Zhao Zhang , Bo Jiang , Pengfei Gao , Chao Peng , Cuiyun Gao

SWE Context Bench: A Benchmark for Context Learning in Coding

Large language models are increasingly used as coding agents for software engineering tasks. Current benchmarks mainly evaluate whether the agent can correctly solve the request or fix the bugs. They largely treat tasks as independent and…

Software Engineering · Computer Science 2026-05-07 Jiayuan Zhu , Junde Wu , Minhao Hu , Shengda Zhu , Jiazhen Pan , Weixiang Shen , Yijun Yang , Fenglin Liu , Jianye Hao , Yueming Jin , Qirong Ho , Min Xu

Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks

Large language models (LLMs) are gaining increasing popularity in software engineering (SE) due to their unprecedented performance across various applications. These models are increasingly being utilized for a range of SE tasks, including…

Software Engineering · Computer Science 2025-11-05 Xing Hu , Feifei Niu , Junkai Chen , Xin Zhou , Junwei Zhang , Junda He , Xin Xia , David Lo

Large Language Models for Software Engineering: A Systematic Literature Review

Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the…

Software Engineering · Computer Science 2024-04-11 Xinyi Hou , Yanjie Zhao , Yue Liu , Zhou Yang , Kailong Wang , Li Li , Xiapu Luo , David Lo , John Grundy , Haoyu Wang

A Survey on Large Language Models for Software Engineering

Software Engineering (SE) is the systematic design, development, maintenance, and management of software applications underpinning the digital infrastructure of our modern world. Very recently, the SE community has seen a rapidly increasing…

Software Engineering · Computer Science 2024-09-10 Quanjun Zhang , Chunrong Fang , Yang Xie , Yaxin Zhang , Yun Yang , Weisong Sun , Shengcheng Yu , Zhenyu Chen

Can Large Language Models Understand Context?

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various…

Computation and Language · Computer Science 2024-02-02 Yilun Zhu , Joel Ruben Antony Moniz , Shruti Bhargava , Jiarui Lu , Dhivya Piraviperumal , Site Li , Yuan Zhang , Hong Yu , Bo-Hsiang Tseng

SELU: A Software Engineering Language Understanding Benchmark

Large Language Models (LLMs) have demonstrated remarkable capabilities in code understanding and generation. However, their effectiveness on non-code Software Engineering (SE) tasks remains underexplored. We present 'Software Engineering…

Software Engineering · Computer Science 2026-02-12 Fabian C. Peña , Steffen Herbold

Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

Large Language Models (LLMs) are highly sensitive to their input contexts, motivating the development of automated context engineering. However, existing methods predominantly treat this as a global search problem, seeking a single context…

Computation and Language · Computer Science 2026-05-18 Jiachen Zhu , Zhuoying Ou , Congmin Zheng , Yuxiang Chen , Zeyu Zheng , Rong Shan , Lingyu Yang , Lionel Z. Wang , Weiwen Liu , Yong Yu , Weinan Zhang , Jianghao Lin

A Survey of Context Engineering for Large Language Models

The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to…

Computation and Language · Computer Science 2025-07-22 Lingrui Mei , Jiayu Yao , Yuyao Ge , Yiwei Wang , Baolong Bi , Yujun Cai , Jiazhi Liu , Mingyu Li , Zhong-Zhi Li , Duzhen Zhang , Chenlin Zhou , Jiayi Mao , Tianze Xia , Jiafeng Guo , Shenghua Liu

ContextBench: A Benchmark for Context Retrieval in Coding Agents

LLM-based coding agents have shown strong performance on automated issue resolution benchmarks, yet existing evaluations largely focus on final task success, providing limited insight into how agents retrieve and use code context during…

Machine Learning · Computer Science 2026-02-12 Han Li , Letian Zhu , Bohan Zhang , Rili Feng , Jiaming Wang , Yue Pan , Earl T. Barr , Federica Sarro , Zhaoyang Chu , He Ye

BERT_SE: A Pre-trained Language Representation Model for Software Engineering

The application of Natural Language Processing (NLP) has achieved a high level of relevance in several areas. In the field of software engineering (SE), NLP applications are based on the classification of similar texts (e.g. software…

Software Engineering · Computer Science 2021-12-02 Eliane Maria De Bortoli Fávero , Dalcimar Casanova

CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models

Aligning large language models (LLMs) with human values is essential for their safe deployment and widespread adoption. Current LLM safety benchmarks often focus solely on the refusal of individual problematic queries, which overlooks the…

Computation and Language · Computer Science 2025-02-10 Guangzhi Sun , Xiao Zhan , Shutong Feng , Philip C. Woodland , Jose Such

The Current Challenges of Software Engineering in the Era of Large Language Models

With the advent of large language models (LLMs) in the artificial intelligence (AI) area, the field of software engineering (SE) has also witnessed a paradigm shift. These models, by leveraging the power of deep learning and massive amounts…

Software Engineering · Computer Science 2024-12-30 Cuiyun Gao , Xing Hu , Shan Gao , Xin Xia , Zhi Jin

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

The emergence of long-context language models with context windows extending to millions of tokens has created new opportunities for sophisticated code understanding and software development evaluation. We propose LoCoBench, a comprehensive…

Software Engineering · Computer Science 2025-09-12 Jielin Qiu , Zuxin Liu , Zhiwei Liu , Rithesh Murthy , Jianguo Zhang , Haolin Chen , Shiyu Wang , Ming Zhu , Liangwei Yang , Juntao Tan , Zhepeng Cen , Cheng Qian , Shelby Heinecke , Weiran Yao , Silvio Savarese , Caiming Xiong , Huan Wang

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

The large language model (LLM)-as-judge paradigm has been used to meet the demand for a cheap, reliable, and fast evaluation of model outputs during AI system development and post-deployment monitoring. While judge models -- LLMs finetuned…

Computation and Language · Computer Science 2025-03-21 Austin Xu , Srijan Bansal , Yifei Ming , Semih Yavuz , Shafiq Joty

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports,…

Computation and Language · Computer Science 2024-06-21 Yushi Bai , Xin Lv , Jiajie Zhang , Hongchang Lyu , Jiankai Tang , Zhidian Huang , Zhengxiao Du , Xiao Liu , Aohan Zeng , Lei Hou , Yuxiao Dong , Jie Tang , Juanzi Li

DSBC : Data Science task Benchmarking with Context engineering

Recent advances in large language models (LLMs) have significantly impacted data science workflows, giving rise to specialized data science agents designed to automate analytical tasks. Despite rapid adoption, systematic benchmarks…

Artificial Intelligence · Computer Science 2025-08-08 Ram Mohan Rao Kadiyala , Siddhant Gupta , Jebish Purbey , Giulio Martini , Ali Shafique , Suman Debnath , Hamza Farooq

TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation: modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve…

Machine Learning · Computer Science 2026-03-31 Qizheng Zhang , Changran Hu , Shubhangi Upasani , Boyuan Ma , Fenglu Hong , Vamsidhar Kamanuru , Jay Rainton , Chen Wu , Mengmeng Ji , Hanchen Li , Urmish Thakker , James Zou , Kunle Olukotun

Context-Enhanced Vulnerability Detection Based on Large Language Model

Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods…

Software Engineering · Computer Science 2025-04-24 Yixin Yang , Bowen Xu , Xiang Gao , Hailong Sun