Related papers: How Scale Affects Structure in Java Programs
A large software system is often composed of many inter-related programs of different sizes. Using the public Eclipse dataset, we replicate our previous study on the distribution of program sizes. Our results confirm that the program sizes…
Code metrics have been widely used to estimate software maintenance effort. Metrics have generally been used to guide developer effort to reduce or avoid future maintenance burdens. Size is the simplest and most widely deployed metric. The…
Code readability and software complexity are important software quality metrics that impact other software metrics such as maintainability, reusability, portability and reliability. This paper presents an empirical study of the…
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution remains poorly understood. While developers often make intuitive assumptions about which types of…
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take…
Code cloning plays a very important role in open-source software engineering. The presence of clones within a project may indicate a need for refactoring, and clones between projects are even more interesting, since code migration takes…
In this paper, we present a complex network approach to the study of software engineering. We have found universal network patterns in a large collection of object-oriented (OO) software systems written in C++ and Java. All the systems…
Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality…
Recent studies have largely investigated the detection of class design anomalies. They proposed a large set of metrics that help in detecting those anomalies and in predicting the quality of class design. While those studies and the…
Open Source Software (OSS) Projects are gaining popularity these days, and they become alternatives in building software system. Despite many failures in these projects, there are some success stories with one of the identified success…
Unlike most other software quality attributes, testability cannot be evaluated solely based on the characteristics of the source code. The effectiveness of the test suite and the budget assigned to the test highly impact the testability of…
Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements…
People demand for software quality is growing increasingly, thus different scales for the software are growing fast to handle the quality of software. The software complexity metric is one of the measurements that use some of the internal…
In current research, there are contrasting results about the applicability of software source code metrics as features for defect prediction models. The goal of the paper is to evaluate the adoption of software metrics in models for…
Improvements in language model capabilities are often attributed to increasing model size or training data, but in some cases smaller models trained on curated data or with different architectural decisions can outperform larger ones…
Decisions on which classes to refactor are fraught with difficulty. The problem of identifying candidate classes becomes acute when confronted with large systems comprising hundreds or thousands of classes. In this paper, we describe a…
This paper presents Megadiff, a dataset of source code diffs. It focuses on Java, with strict inclusion criteria based on commit message and diff size. Megadiff contains 663 029 Java diffs that can be used for research on commit…
A large computer program is typically divided into many hundreds or even thousands of smaller units, whose logical connections define a network in a natural way. This network reflects the internal structure of the program, and defines the…
In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on "usage diversity'", defined as the different statically observable combinations of methods called on the same object. We…
In this paper, the term formula code refers to fragments of source code that implement a mathematical formula. We present empirical studies that analyze the diversity and frequency of formula code in open-source-software projects. In an…