Related papers: GitEvo: Code Evolution Analysis for Git Repositori…
Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find…
Currently, most machine learning models are trained by centralized teams and are rarely updated. In contrast, open-source software development involves the iterative development of a shared artifact through distributed collaboration using a…
We introduce CodeEvolve, an open-source framework that couples large language models with island-based evolutionary search for end-to-end algorithmic discovery. CodeEvolve integrates inspiration-based crossover, meta-prompting, and…
Collaborative vocabulary development in the context of data integration is the process of finding consensus between the experts of the different systems and domains. The complexity of this process is increased with the number of involved…
Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve (Novikov et al., 2025; Georgiev et al., 2025), have demonstrated remarkable success in discovering novel mathematical constructions and solving challenging…
Git and GitHub are common tools for keeping track of multiple versions of data analytic content, which allow for more than one person to simultaneously work on a project. GitHub Classroom aims to provide a way for students to work on and…
Empirical research on code review processes is increasingly central to understanding software quality and collaboration. However, collecting and analyzing review data remains a time-consuming and technically intensive task. Most researchers…
Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown…
Software quality is an important problem for technology companies, since it substantially impacts the efficiency, usefulness, and maintainability of the final product; hence, code review is a must-do activity for software developers. During…
Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the…
Recent advancements in code completion models have primarily focused on local file contexts. However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving…
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of open-source scientific software which implements bleeding-edge science in its code. However, aligning the link between…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
Software evolves with changes to its codebase over time. Internally, software changes in response to decisions to include some code change into the codebase and discard others. Explaining the mechanism of software evolution, this paper…
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a…
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks…
Context: Software systems are in continuous evolution through source code changes to fixing bugs, adding new functionalities and improving the internal architecture. All these practices are recorded in the version history, which can be…
To evaluate the repository-level code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation methods have been developed. These methods typically leverage contextual…
Version control systems are integral to software development, with GitHub emerging as a popular online platform due to its comprehensive project management tools, including issue tracking and pull requests. However, GitHub lacks a direct…
GitHub hosts hundreds of millions of code repositories written in hundreds of different programming languages. In addition to its hosting services, GitHub provides data and insights into code, such as vulnerability analysis and code…