Related papers: LibEvolutionEval: A Benchmark and Study for Versio…

GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models

The rapid evolution of software libraries presents a significant challenge for code generation models, which must adapt to frequent version updates while maintaining compatibility with previous versions. Existing code completion benchmarks…

Software Engineering · Computer Science 2024-11-12 Nizar Islah , Justine Gehring , Diganta Misra , Eilif Muller , Irina Rish , Terry Yue Zhuo , Massimo Caccia

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks…

Software Engineering · Computer Science 2025-07-23 Diganta Misra , Nizar Islah , Victor May , Brice Rauby , Zihan Wang , Justine Gehring , Antonio Orvieto , Muawiz Chaudhary , Eilif B. Muller , Irina Rish , Samira Ebrahimi Kahou , Massimo Caccia

TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation

In recent years, Large Language Models (LLMs) have been widely studied in the code translation field on the method, class, and even repository levels. However, most of these benchmarks are limited in terms of Third-Party Library (TPL)…

Software Engineering · Computer Science 2026-01-21 Pengyu Xue , Kunwu Zheng , Zhen Yang , Yifei Pei , Linhao Wu , Jiahui Dong , Xiapu Luo , Yan Xiao , Fei Liu , Yuxuan Zhang , Xiran Lyu , Xianhang Li , Xuanyu Zhu , Chengyi Wang

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS),…

Software Engineering · Computer Science 2026-05-07 Ajay Krishna Borra , Wenzhuo Yang , Samarth Arora , Akhilesh Deepak Gotmare , Gokulakrishnan Gopalakrishnan , Tharun Gali , Madhav Rathi , Doyen Sahoo , Manpreet Singh , Mayuresh Verma , Laksh Venka , Shuchita Singh

GitEvo: Code Evolution Analysis for Git Repositories

Analyzing the code evolution of software systems is relevant for practitioners, researchers, and educators. It can help practitioners identify design trends and maintenance challenges, provide researchers with empirical data to study…

Software Engineering · Computer Science 2026-02-03 Andre Hora

EvoTorch: Scalable Evolutionary Computation in Python

Evolutionary computation is an important component within various fields such as artificial intelligence research, reinforcement learning, robotics, industrial automation and/or optimization, engineering design, etc. Considering the…

Neural and Evolutionary Computing · Computer Science 2023-05-23 Nihat Engin Toklu , Timothy Atkinson , Vojtěch Micka , Paweł Liskowski , Rupesh Kumar Srivastava

ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which…

Software Engineering · Computer Science 2024-09-17 Jia Feng , Jiachen Liu , Cuiyun Gao , Chun Yong Chong , Chaozheng Wang , Shan Gao , Xin Xia

On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions

Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are…

Software Engineering · Computer Science 2024-03-25 Matteo Ciniselli , Alberto Martin-Lopez , Gabriele Bavota

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party library (TPL) imports annotated with specific version identifiers. These version choices can…

Software Engineering · Computer Science 2026-05-08 Chengjie Wang , Jingzheng Wu , Xiang Ling , Tianyue Luo , Chen Zhao

EvolveTool-Bench: Evaluating the Quality of LLM-Generated Tool Libraries as Software Artifacts

Modern LLM agents increasingly create their own tools at runtime -- from Python functions to API clients -- yet existing benchmarks evaluate them almost exclusively by downstream task completion. This is analogous to judging a software…

Software Engineering · Computer Science 2026-04-02 Alibek T. Kaliyev , Artem Maryanskyy

Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution

Software security mainly studies vulnerability detection: is my code vulnerable today? This hinders risk estimation, so new approaches are emerging to forecast the occurrence of future vulnerabilities. While useful, these approaches are…

Software Engineering · Computer Science 2024-11-19 Carlos E. Budde , Ranindya Paramitha , Fabio Massacci

ExecRepoBench: Multi-level Executable Code Completion Evaluation

Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant…

Computation and Language · Computer Science 2024-12-17 Jian Yang , Jiajun Zhang , Jiaxi Yang , Ke Jin , Lei Zhang , Qiyao Peng , Ken Deng , Yibo Miao , Tianyu Liu , Zeyu Cui , Binyuan Hui , Junyang Lin

ReadMe.LLM: A Framework to Help LLMs Understand Your Library

Large Language Models (LLMs) often struggle with code generation tasks involving niche software libraries. Existing code generation techniques with only human-oriented documentation can fail -- even when the LLM has access to web search and…

Software Engineering · Computer Science 2025-05-09 Sandya Wijaya , Jacob Bolano , Alejandro Gomez Soteres , Shriyanshu Kode , Yue Huang , Anant Sahai

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour. A core enabler of this change in software development is the…

Software Engineering · Computer Science 2023-08-02 Sharon Chee Yin Ho , Vahid Majdinasab , Mohayeminul Islam , Diego Elias Costa , Emad Shihab , Foutse Khomh , Sarah Nadi , Muhammad Raza

Import2vec - Learning Embeddings for Software Libraries

We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning…

Software Engineering · Computer Science 2019-04-09 Bart Theeten , Frederik Vandeputte , Tom Van Cutsem

When Language Model Meets Private Library

With the rapid development of pre-training techniques, a number of language models have been pre-trained on large-scale code corpora and perform well in code generation. In this paper, we investigate how to equip pre-trained language models…

Programming Languages · Computer Science 2022-11-01 Daoguang Zan , Bei Chen , Zeqi Lin , Bei Guan , Yongji Wang , Jian-Guang Lou

Evolve: A Persistent Knowledge Lifecycle for Small Language Models

Evolve pairs a small local language model with a persistent, teacher-compiled knowledge store -- refined through sleep consolidation and usage-driven refresh -- to deliver substantial accuracy gains over the model's parametric baseline…

Machine Learning · Computer Science 2026-04-28 Dikran Hovagimian

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation

In recent years, large language models (LLMs) have showcased significant advancements in code generation. However, most evaluation benchmarks are primarily oriented towards Python, making it difficult to evaluate other programming…

Machine Learning · Computer Science 2025-06-02 Ivan Petrukha , Yana Kurliak , Nataliia Stulova

Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM

LLMs have become the go-to choice for code generation tasks, with an exponential increase in the training, development, and usage of LLMs specifically for code generation. To evaluate the ability of LLMs on code, both academic and industry…

Software Engineering · Computer Science 2024-03-29 Chunqiu Steven Xia , Yinlin Deng , Lingming Zhang

Automatic Library Version Identification, an Exploration of Techniques

This paper is the result of a two month research internship on the topic of library version identification. In this paper, ideas and techniques from literature in the area of binary comparison and fingerprinting are outlined and applied to…

Cryptography and Security · Computer Science 2017-03-02 Thomas Rinsma