Related papers: CommitBART: A Large Pre-trained Model for GitHub C…
Commit message is a document that summarizes source code changes in natural language. A good commit message clearly shows the source code changes, so this enhances collaboration between developers. Therefore, our work is to develop a model…
A commit message is a textual description of the code changes in a commit, which is a key part of the Git version control system (VCS). It captures the essence of software updating. Therefore, it can help developers understand code…
Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…
Commit messages aid developers in their understanding of a continuously evolving codebase. However, developers not always document code changes properly. Automatically generating commit messages would relieve this burden on developers.…
Commit messages are explanations of changes made to a codebase that are stored in version control systems. They help developers understand the codebase as it evolves. However, writing commit messages can be tedious and inconsistent among…
Writing commit messages is a tedious daily task for many software developers, and often remains neglected. Automating this task has the potential to save time while ensuring that messages are informative. A high-quality dataset and an…
Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…
We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is…
Quantum computing is rapidly advancing, but quantum software development faces significant challenges, including a steep learning curve, high hardware error rates, and a lack of mature engineering practices. This study conducts a…
Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…
Commit messages are valuable resources for describing why code changes are committed to repositories in version control systems (e.g., Git). They effectively help developers understand code changes and better perform software maintenance…
Commit signing is a principal mechanism for verifying the origin of code in software supply chains. Security frameworks treat it as a core trust signal, assuming developers sign their commits consistently with keys they control and keep…
Social coding platforms, such as GitHub, serve as laboratories for studying collaborative problem solving in open source software development; a key feature is their ability to support issue reporting which is used by teams to discuss tasks…
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages.…
Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different…
Commit messages play a crucial role in collaborative software development. They provide a clear and concise description of the changes made to the source code. However, many commit messages among students' projects lack useful information.…
With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…
Software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that…
Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We…
Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows preventing bias in socio-technical studies…