Related papers: Gistable: Evaluating the Executability of Python C…
Platforms like Stack Overflow and GitHub's gist system promote the sharing of ideas and programming techniques via the distribution of code snippets designed to illustrate particular tasks. Python, a popular and fast-growing programming…
Online resources today contain an abundant amount of code snippets for documentation, collaboration, learning, and problem-solving purposes. Their executability in a "plug and play" manner enables us to confirm their quality and use them…
When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are…
As coding agents are increasingly deployed in large codebases, the need to automatically design challenging, codebase-level evaluation is central. We propose Gistify, a task where a coding LLM must create a single, minimal, self-contained…
The rapid evolution of software libraries presents a significant challenge for code generation models, which must adapt to frequent version updates while maintaining compatibility with previous versions. Existing code completion benchmarks…
Studies show that software developers often either misuse exception handling features or use them inefficiently, and such a practice may lead an undergoing software project to a fragile, insecure and non-robust application system. In this…
GitHub is the most widely used social, distributed version control system. It has around 10 million registered users and hosts over 16 million public repositories. Its user base is also very active as GitHub ranks in the top 100 Alexa most…
Peer code review locates common coding rule violations and simple logical errors in the early phases of software development, and thus reduces overall cost. However, in GitHub, identifying an appropriate code reviewer for a pull request is…
Storing user-specific configuration files in a "dotfiles" repository is a common practice among software developers, with hundreds of thousands choosing to publicly host their repositories on GitHub. This practice not only provides…
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks…
The Web is replete with tutorial-style content on how to accomplish programming tasks. Unfortunately, even top-ranked tutorials suffer from severe security vulnerabilities, such as cross-site scripting (XSS), and SQL injection (SQLi).…
There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described…
How can instructors facilitate spreading out the work in a software engineering or computer science capstone course across time and among team members? Currently teams often compromise the quality of their learning experience by frantically…
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a…
GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in…
In recent years, the research community has raised serious questions about the reproducibility of scientific work. In particular, since many studies include some kind of computing work, reproducibility is also a technological challenge, not…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various…
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We build a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub…
With the rapid development of AI technologies, thousands of AI papers are being published each year. Many of these papers have released sample code to facilitate follow-up researchers. This paper presents an explorative study of over 1700…