Related papers: Identifying Source Code File Experts

Machine Learning Based Source Code Classification Using Syntax Oriented Features

As of today the programming language of the vast majority of the published source code is manually specified or programmatically assigned based on the sole file extension. In this paper we show that the source code programming language…

Machine Learning · Computer Science 2017-03-23 Shaul Zevin , Catherine Holzem

Use of Source Code Similarity Metrics in Software Defect Prediction

In recent years, defect prediction has received a great deal of attention in the empirical software engineering world. Predicting software defects before the maintenance phase is very important not only to decrease the maintenance costs but…

Software Engineering · Computer Science 2018-08-31 Ahmet Okutan

Learning and Suggesting Source Code Changes from Version History: A Systematic Review

Context: Software systems are in continuous evolution through source code changes to fixing bugs, adding new functionalities and improving the internal architecture. All these practices are recorded in the version history, which can be…

Software Engineering · Computer Science 2020-01-17 Leandro Ungari Cayres , Bruno Santos de Lima , Rogério Eduardo Garcia

Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics

In any sufficiently complex software system there are experts, having a deeper understanding of parts of the system than others. However, it is not always clear who these experts are and which particular parts of the system they can provide…

Software Engineering · Computer Science 2018-09-21 Ralf Teusner , Christoph Matthies , Philipp Giese

A Survey on Machine Learning Techniques for Source Code Analysis

The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number…

Software Engineering · Computer Science 2022-09-14 Tushar Sharma , Maria Kechagia , Stefanos Georgiou , Rohit Tiwari , Indira Vats , Hadi Moazen , Federica Sarro

Source Code Recommender Systems: The Practitioners' Perspective

The automatic generation of source code is one of the long-lasting dreams in software engineering research. Several techniques have been proposed to speed up the writing of new code. For example, code completion techniques can recommend to…

Software Engineering · Computer Science 2023-02-09 Matteo Ciniselli , Luca Pascarella , Emad Aghajani , Simone Scalabrino , Rocco Oliveto , Gabriele Bavota

SCC: Automatic Classification of Code Snippets

Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the…

Software Engineering · Computer Science 2018-09-24 Kamel Alreshedy , Dhanush Dharmaretnam , Daniel M. German , Venkatesh Srinivasan , T. Aaron Gulliver

Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers' have validated the use of different classification techniques to develop…

Software Engineering · Computer Science 2017-04-17 Lov Kumar , Santanu Rath , Ashish Sureka

Estimating defectiveness of source code: A predictive model using GitHub content

Two key contributions presented in this paper are: i) A method for building a dataset containing source code features extracted from source files taken from Open Source Software (OSS) and associated bug reports, ii) A predictive model for…

Software Engineering · Computer Science 2018-09-13 Ritu Kapur , Balwinder Sodhi

A Practical Approach to the Automatic Classification of Security-Relevant Commits

The lack of reliable sources of detailed information on the vulnerabilities of open-source software (OSS) components is a major obstacle to maintaining a secure software supply chain and an effective vulnerability management process.…

Cryptography and Security · Computer Science 2025-03-18 Antonino Sabetta , Michele Bezzi

A Comparative Study of Different Source Code Metrics and Machine Learning Algorithms for Predicting Change Proneness of Object Oriented Systems

Change-prone classes or modules are defined as software components in the source code which are likely to change in the future. Change-proneness prediction is useful to the maintenance team as they can optimize and focus their testing…

Software Engineering · Computer Science 2017-12-22 Lov Kumar , Ashish Sureka

Navigating Expertise in Configurable Software Systems through the Maze of Variability

The understanding of source code in large-scale software systems poses a challenge for developers. The role of expertise in source code becomes critical for identifying developers accountable for substantial changes. However, in the context…

Software Engineering · Computer Science 2025-08-26 Karolina Milano , Bruno Cafeo

Source Code Metrics for Software Defects Prediction

In current research, there are contrasting results about the applicability of software source code metrics as features for defect prediction models. The goal of the paper is to evaluate the adoption of software metrics in models for…

Software Engineering · Computer Science 2023-01-20 Dominik Arne Rebro , Bruno Rossi , Stanislav Chren

Exploring the Impact of Code Style in Identifying Good Programmers

Code style is an aesthetic choice exhibited in source code that reflects programmers individual coding habits. This study is the first to investigate whether code style can be used as an indicator to identify good programmers. Data from…

Software Engineering · Computer Science 2024-09-02 Rafed Muhammad Yasir , Ahmedul Kabir

Improving type information inferred by decompilers with supervised machine learning

In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although…

Software Engineering · Computer Science 2021-02-25 Javier Escalada , Ted Scully , Francisco Ortin

Source Code Retrieval Using Sequence Based Similarity

Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We…

Software Engineering · Computer Science 2013-08-19 Yoshihisa Udagawa

What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes

Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality…

Software Engineering · Computer Science 2022-05-31 Alexander Trautsch , Johannes Erbel , Steffen Herbold , Jens Grabowski

Content-Based Textual File Type Detection at Scale

Programming language detection is a common need in the analysis of large source code bases. It is supported by a number of existing tools that rely on several features, and most notably file extensions, to determine file types. We consider…

Software Engineering · Computer Science 2021-03-02 Francesca Del Bonifro , Maurizio Gabbrielli , Stefano Zacchiroli

An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

Artificial Intelligence (AI) techniques, especially Large Language Models (LLMs), have started gaining popularity among researchers and software developers for generating source code. However, LLMs have been shown to generate code with…

Software Engineering · Computer Science 2024-11-08 Hyunjae Suh , Mahan Tafreshipour , Jiawei Li , Adithya Bhattiprolu , Iftekhar Ahmed

Logical Segmentation of Source Code

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting…

Software Engineering · Computer Science 2019-07-23 Jacob Dormuth , Ben Gelman , Jessica Moore , David Slater