Related papers: Diff-XYZ: A Benchmark for Evaluating Diff Understa…

Toward Interactive Optimization of Source Code Differences: An Empirical Study of Its Performance

A source code difference (diff) indicates changes made by comparing new and old source codes, and it can be utilized in code reviews to help developers understand the changes made to the code. Although many diff generation methods have been…

Software Engineering · Computer Science 2024-09-27 Tsukasa Yagi , Shinpei Hayashi

What a diff makes: automating code migration with large language models

Modern software programs are built on stacks that are often undergoing changes that introduce updates and improvements, but may also break any project that depends upon them. In this paper we explore the use of Large Language Models (LLMs)…

Software Engineering · Computer Science 2025-11-04 Katherine A. Rosenfeld , Cliff C. Kerr , Jessica Lundin

To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing

Large Language Models (LLMs) are increasingly used for code editing, yet the prevalent full-code generation paradigm suffers from severe efficiency bottlenecks, posing challenges for interactive coding assistants that demand low latency and…

Software Engineering · Computer Science 2026-05-01 Wei Cheng , Yongchang Cao , Chen Shen , Binhua Li , Jue Chen , Yongbin Li , Wei Hu

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce…

Artificial Intelligence · Computer Science 2026-05-12 Rongtian Ye

LLM Code Customization with Visual Results: A Benchmark on TikZ

With the rise of AI-based code generation, customizing existing code out of natural language instructions to modify visual results -such as figures or images -has become possible, promising to reduce the need for deep programming expertise.…

Software Engineering · Computer Science 2025-06-05 Charly Reux , Mathieu Acher , Djamel Eddine Khelladi , Olivier Barais , Clément Quinton

Using the DIFF Command for Natural Language Processing

Diff is a software program that detects differences between two data sets and is useful in natural language processing. This paper shows several examples of the application of diff. They include the detection of differences between two…

Computation and Language · Computer Science 2007-05-23 Masaki Murata , Hitoshi Isahara

BDiff: Block-aware and Accurate Text-based Code Differencing

Code differencing is a fundamental technique in software engineering practice and research. While researchers have proposed text-based differencing techniques capable of identifying line changes over the past decade, existing methods…

Software Engineering · Computer Science 2025-10-27 Yao Lu , Wanwei Liu , Tanghaoran Zhang , Kang Yang , Yang Zhang , Wenyu Xu , Longfei Sun , Xinjun Mao , Shuzheng Gao , Michael R. Lyu

A Differential Fuzzing-Based Evaluation of Functional Equivalence in LLM-Generated Code Refactorings

With the rapid adoption of large language models (LLMs) in automated code refactoring, assessing and ensuring functional equivalence between LLM-generated refactoring and the original implementation becomes critical. While prior work…

Software Engineering · Computer Science 2026-02-18 Simantika Bhattacharjee Dristi , Matthew B. Dwyer

RefDiff: Detecting Refactorings in Version Histories

Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to…

Software Engineering · Computer Science 2018-08-07 Danilo Silva , Marco Tulio Valente

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their…

Machine Learning · Computer Science 2024-10-15 Lai Wei , Zhiquan Tan , Chenghai Li , Jindong Wang , Weiran Huang

Illuminating Patterns of Divergence: DataDios SmartDiff for Large-Scale Data Difference Analysis

Data engineering workflows require reliable differencing across files, databases, and query outputs, yet existing tools falter under schema drift, heterogeneous types, and limited explainability. SmartDiff is a unified system that combines…

Databases · Computer Science 2025-09-03 Aryan Poduri , Yashwant Tailor

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Large Language Models (LLMs) have exhibited exceptional performance in software engineering yet face challenges in adapting to continually evolving code knowledge, particularly regarding the frequent updates of third-party library APIs.…

Computation and Language · Computer Science 2025-06-19 Chenlong Wang , Zhaoyang Chu , Zhengxiang Cheng , Xuyi Yang , Kaiyue Qiu , Yao Wan , Zhou Zhao , Xuanhua Shi , Dongping Chen

Understanding Code Change with Micro-Changes

A crucial activity in software maintenance and evolution is the comprehension of the changes performed by developers, when they submit a pull request and/or perform a commit on the repository. Typically, code changes are represented in the…

Software Engineering · Computer Science 2025-02-26 Lei Chen , Michele Lanza , Shinpei Hayashi

Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection

Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Chuer Yu , Xuhong Zhang , Yuxuan Duan , Senbo Yan , Zonghui Wang , Yang Xiang , Shouling Ji , Wenzhi Chen

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation

We present Agent-Diff, a novel benchmarking framework for evaluating agentic Large Language Models (LLMs) on real-world productivity software API tasks via code execution. Agentic LLM performance varies due to differences in models,…

Software Engineering · Computer Science 2026-04-29 Hubert M. Pysklo , Artem Zhuravel , Patrick D. Watson

CodeFuse-CommitEval: Towards Benchmarking LLM's Power on Commit Message and Code Change Inconsistency Detection

Version control relies on commit messages to convey the rationale for code changes, but these messages are often low quality and, more critically, inconsistent with their diffs-known as message-code inconsistency (MCI). MCIs mislead…

Software Engineering · Computer Science 2025-11-26 Qingyu Zhang , Puzhuo Liu , Peng Di , Chenxiong Qian

Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contributor experience levels. We present a case…

Software Engineering · Computer Science 2026-05-22 Matteo Vaccargiu , Sabrina Aufiero , Silvia Bartolucci , Ronnie de Souza Santos , Roberto Tonelli , Giuseppe Destefanis

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding

The comic domain is rapidly advancing with the development of single-page analysis and synthesis models. However, evaluation metrics and datasets lag behind, often limited to small-scale or single-style test sets. We introduce a novel…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Emanuele Vivoli , Marco Bertini , Dimosthenis Karatzas

A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback

Large language models (LLMs) have advanced significantly in code generation, yet their ability to follow complex programming instructions with layered and diverse constraints remains underexplored. Existing benchmarks often prioritize…

Software Engineering · Computer Science 2025-07-02 Guoliang Duan , Mingwei Liu , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark

Large models have achieved remarkable performance across various tasks, yet they incur significant computational costs and privacy concerns during both training and inference. Distributed deployment has emerged as a potential solution, but…

Multimedia · Computer Science 2025-09-03 Changsheng Gao , Yifan Ma , Qiaoxi Chen , Yenan Xu , Dong Liu , Weisi Lin