Related papers: Feature-Gathering Dependency-Based Software Cluste…
To facilitate understanding the architecture of a software system, we developed SArF Map technique that visualizes software architecture from feature and layer viewpoints using a city metaphor. SArF Map visualizes implicit software features…
Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software…
Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…
Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering…
This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are…
We present a framework for clustering with cluster-specific feature selection. The framework, CRAFT, is derived from asymptotic log posterior formulations of nonparametric MAP-based clustering models. CRAFT handles assorted data, i.e., both…
The clustering of autonomous driving scenario data can substantially benefit the autonomous driving validation and simulation systems by improving the simulation tests' completeness and fidelity. This article proposes a comprehensive data…
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high dimensional data. Many significant subspace clustering algorithms exist, each having different characteristics caused by the use of different…
Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features,…
Software remodularization through clustering is a common practice to improve internal software quality. However, the true benefit of software clustering is only realized if developers follow through with the recommended refactoring…
Graph clustering is an unsupervised machine learning method that partitions the nodes in a graph into different groups. Despite achieving significant progress in exploiting both attributed and structured data information, graph clustering…
Semi-supervised clustering seeks to augment traditional clustering methods by incorporating side information provided via human expertise in order to increase the semantic meaningfulness of the resulting clusters. However, most current…
A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering…
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt…
Along with the flourish of the information age, massive amounts of data are generated day by day. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in…
Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to…
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…
In the context of unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data. However, the success of clustering algorithms often depends on the relevance and contribution of…
Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving…
Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of faults in modules to improve software quality and reduce maintenance costs. This research investigates the combined effects of…