English
Related papers

Related papers: JMigBench: A Benchmark for Evaluating LLMs on Sour…

200 papers

With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark…

Software Engineering · Computer Science 2026-05-29 Linbo Liu , Xinle Liu , Qiang Zhou , Lin Chen , Yihan Liu , Hoan Nguyen , Behrooz Omidvar-Tehrani , Xi Shen , Jun Huan , Omer Tripp , Anoop Deoras

Large language models (LLMs) have shown remarkable capabilities across various software engineering tasks; however, their effectiveness in code migration, adapting code to run in different environments, remains insufficiently studied. In…

Software Engineering · Computer Science 2025-06-03 Keyuan Cheng , Xudong Shen , Yihao Yang , Tengyue Wang , Yang Cao , Muhammad Asif Ali , Hanbin Wang , Lijie Hu , Di Wang

This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hunks from 1439 GitHub repositories. The ground truth is the…

Machine Learning · Computer Science 2026-05-26 Benedikt Schesch , Michael D. Ernst

AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such…

Software Engineering · Computer Science 2025-10-14 Victor May , Diganta Misra , Yanqi Luo , Anjali Sridhar , Justine Gehring , Silvio Soares Ribeiro Junior

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which…

Software Engineering · Computer Science 2024-09-17 Jia Feng , Jiachen Liu , Cuiyun Gao , Chun Yong Chong , Chaozheng Wang , Shan Gao , Xin Xia

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing…

With the advancement of automated software engineering, research focus is increasingly shifting toward practical tasks reflecting the day-to-day work of software engineers. Among these tasks, software migration, a critical process of…

Software Engineering · Computer Science 2026-04-29 Ryo Fujii , Makoto Morishita , Kazuki Yano , Jun Suzuki

Library migration is the process of replacing one library with another library that provides similar functionality. Manual library migration is time consuming and error prone, as it requires developers to understand the APIs of both…

Software Engineering · Computer Science 2025-10-14 Md Mohayeminul Islam , Ajay Kumar Jha , May Mahmoud , Ildar Akhmetov , Sarah Nadi

Large language models (LLMs) have achieved state-of-the-art performance in various software engineering tasks, including error detection, clone detection, and code translation, primarily leveraging high-resource programming languages like…

Computation and Language · Computer Science 2025-06-11 Razan Baltaji , Saurabh Pujar , Louis Mandel , Martin Hirzel , Luca Buratti , Lav Varshney

Background: Leaking sensitive information - such as API keys, tokens, and credentials - in source code remains a persistent security threat. Traditional regex and entropy-based tools often generate high false positives due to limited…

Software Engineering · Computer Science 2025-07-29 Md Nafiu Rahman , Sadif Ahmed , Zahin Wahab , S M Sohan , Rifat Shahriyar

Assertion messages significantly enhance unit tests by clearly explaining the reasons behind test failures, yet they are frequently omitted by developers and automated test-generation tools. Despite recent advancements, Large Language…

Software Engineering · Computer Science 2025-09-25 Ahmed Aljohani , Anamul Haque Mollah , Hyunsook Do

Code-mixing, the practice of switching between languages within a conversation, poses unique challenges for traditional NLP. Existing benchmarks are limited by their narrow language pairs and tasks, failing to adequately assess large…

Computation and Language · Computer Science 2025-09-09 Yilun Yang , Yekun Chai

Can low-cost large language models (LLMs) take over the interpretive coding work that still anchors much of empirical content analysis? This paper introduces ContentBench, a public benchmark suite that helps answer this replacement question…

Computers and Society · Computer Science 2026-02-24 Michael Haman

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code…

Software Engineering · Computer Science 2024-09-16 Yuwei Zhao , Ziyang Luo , Yuchen Tian , Hongzhan Lin , Weixiang Yan , Annan Li , Jing Ma

In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail…

Software Engineering · Computer Science 2024-03-29 Zhengran Zeng , Yidong Wang , Rui Xie , Wei Ye , Shikun Zhang

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance, yet systematic benchmarks for evaluating large language models on this problem remain scarce. In this work, we present…

Software Engineering · Computer Science 2025-12-02 Parisa Hamedi , Hamed Jelodar , Samita Bai , Mohammad Meymani , Roozbeh Razavi-Far , Ali A. Ghorbani

Implementing new features in repository-level codebases is a crucial application of code generation models. However, current benchmarks lack a dedicated evaluation framework for this capability. To fill this gap, we introduce FEA-Bench, a…

Software Engineering · Computer Science 2025-06-23 Wei Li , Xin Zhang , Zhongxin Guo , Shaoguang Mao , Wen Luo , Guangyue Peng , Yangyu Huang , Houfeng Wang , Scarlett Li
‹ Prev 1 2 3 10 Next ›