Related papers: Feature-Gathering Dependency-Based Software Cluste…

SArF Map: Visualizing Software Architecture from Feature and Layer Viewpoints

To facilitate understanding the architecture of a software system, we developed SArF Map technique that visualizes software architecture from feature and layer viewpoints using a city metaphor. SArF Map visualizes implicit software features…

Software Engineering · Computer Science 2013-06-06 Kenichi Kobayashi , Manabu Kamimura , Keisuke Yano , Koki Kato , Akihiko Matsuo

Software Module Clustering: An In-Depth Literature Analysis

Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software…

Software Engineering · Computer Science 2020-12-03 Qusay I. Sarhan , Bestoun S. Ahmed , Miroslav Bures , Kamal Z. Zamli

SACA: Selective Attention-Based Clustering Algorithm

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…

Machine Learning · Computer Science 2025-12-01 Meysam Shirdel Bilehsavar , Razieh Ghaedi , Samira Seyed Taheri , Xinqi Fan , Christian O'Reilly

E-SC4R: Explaining Software Clustering for Remodularisation

Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering…

Software Engineering · Computer Science 2021-10-05 Alvin Jian Jia Tan , Chun Yong Chong , Aldeida Aleti

Feature selection or extraction decision process for clustering using PCA and FRSD

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are…

Machine Learning · Computer Science 2021-11-23 Jean-Sebastien Dessureault , Daniel Massicotte

CRAFT: ClusteR-specific Assorted Feature selecTion

We present a framework for clustering with cluster-specific feature selection. The framework, CRAFT, is derived from asymptotic log posterior formulations of nonparametric MAP-based clustering models. CRAFT handles assorted data, i.e., both…

Machine Learning · Computer Science 2015-06-26 Vikas K. Garg , Cynthia Rudin , Tommi Jaakkola

Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction

The clustering of autonomous driving scenario data can substantially benefit the autonomous driving validation and simulation systems by improving the simulation tests' completeness and fidelity. This article proposes a comprehensive data…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Jinxin Zhao , Jin Fang , Zhixian Ye , Liangjun Zhang

SCAF An effective approach to Classify Subspace Clustering algorithms

Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high dimensional data. Many significant subspace clustering algorithms exist, each having different characteristics caused by the use of different…

Databases · Computer Science 2013-04-15 Sunita Jahirabadkar , Parag Kulkarni

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features,…

Machine Learning · Computer Science 2021-11-17 Xuyang Yan , Mrinmoy Sarkar , Biniam Gebru , Shabnam Nazmi , Abdollah Homaifar

Closing the Loop for Software Remodularisation -- REARRANGE: An Effort Estimation Approach for Software Clustering-based Remodularisation

Software remodularization through clustering is a common practice to improve internal software quality. However, the true benefit of software clustering is only realized if developers follow through with the recommended refactoring…

Software Engineering · Computer Science 2023-03-14 Alvin Jian Jia Tan , Chun Yong Chong , Aldeida Aleti

Attributed Graph Clustering in Collaborative Settings

Graph clustering is an unsupervised machine learning method that partitions the nodes in a graph into different groups. Despite achieving significant progress in exploiting both attributed and structured data information, graph clustering…

Machine Learning · Computer Science 2025-01-03 Rui Zhang , Xiaoyang Hou , Zhihua Tian , Yan he , Enchao Gong , Jian Liu , Qingbiao Wu , Kui Ren

Active Clustering with Model-Based Uncertainty Reduction

Semi-supervised clustering seeks to augment traditional clustering methods by incorporating side information provided via human expertise in order to increase the semantic meaningfulness of the resulting clusters. However, most current…

Machine Learning · Computer Science 2014-02-17 Caiming Xiong , David Johnson , Jason J. Corso

Explainable cluster analysis: a bagging approach

A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering…

Machine Learning · Statistics 2026-03-23 Federico Maria Quetti , Elena Ballante , Silvia Figini , Paolo Giudici

Causality-based Feature Selection: Methods and Evaluations

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt…

Machine Learning · Computer Science 2019-11-19 Kui Yu , Xianjie Guo , Lin Liu , Jiuyong Li , Hao Wang , Zhaolong Ling , Xindong Wu

Compactness Score: A Fast Filter Method for Unsupervised Feature Selection

Along with the flourish of the information age, massive amounts of data are generated day by day. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in…

Machine Learning · Computer Science 2023-04-04 Peican Zhu , Xin Hou , Keke Tang , Zhen Wang , Feiping Nie

Scaling Fine-grained Modularity Clustering for Massive Graphs

Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to…

Social and Information Networks · Computer Science 2019-05-28 Hiroaki Shiokawa , Toshiyuki Amagasa , Hiroyuki Kitagawa

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…

Methodology · Statistics 2014-07-11 Eric Bair

Refining Filter Global Feature Weighting for Fully-Unsupervised Clustering

In the context of unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data. However, the success of clustering algorithms often depends on the relevance and contribution of…

Machine Learning · Computer Science 2025-03-18 Fabian Galis , Darian Onchis

Distributed ReliefF based Feature Selection in Spark

Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving…

Machine Learning · Computer Science 2018-11-02 Raul-Jose Palma-Mendoza , Daniel Rodriguez , Luis de-Marcos

A Feature-Driven Framework for Software Fault Prediction

Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of faults in modules to improve software quality and reduce maintenance costs. This research investigates the combined effects of…

Software Engineering · Computer Science 2026-05-19 Ahmad Nauman Ghazi , Nagajyothi Devarapalli , Ashir Javeed , Sadi Alawadi , Fahed Alkhabbas , Khalid AlKharabsheh