Related papers: A matching based clustering algorithm for categori…

Categorical Data Clustering via Value Order Estimated Distance Metric Learning

Clustering is a popular machine learning technique for data mining that can process and analyze datasets to automatically reveal sample distribution patterns. Since the ubiquitous categorical data naturally lack a well-defined metric space…

Machine Learning · Computer Science 2025-09-01 Yiqun Zhang , Mingjie Zhao , Hong Jia , Yang Lu , Mengke Li , Yiu-ming Cheung

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

A Rapid Review of Clustering Algorithms

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao

A Hash-based Co-Clustering Algorithm for Categorical Data

Many real-life data are described by categorical attributes without a pre-classification. A common data mining method used to extract information from this type of data is clustering. This method group together the samples from the data…

Machine Learning · Computer Science 2014-07-30 Fabricio Olivetti de França

Clustering categorical data via ensembling dissimilarity matrices

We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other…

Machine Learning · Statistics 2017-09-20 Saeid Amiri , Bertrand Clarke , Jennifer Clarke

Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes

Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is…

Machine Learning · Statistics 2024-03-12 Dylan Soemitro , Jeova Farias Sales Rocha Neto

Partitioning Clustering algorithms for handling numerical and categorical data: a review

Clustering is widely used in different field such as biology, psychology, and economics. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with…

Databases · Computer Science 2019-07-03 Trupti M. Kodinariya Dr. Prashant R. Makwana

Algorithms and Complexity of Range Clustering

We introduce a novel criterion in clustering that seeks clusters with limited range of values associated with each cluster's elements. In clustering or classification the objective is to partition a set of objects into subsets, called…

Data Structures and Algorithms · Computer Science 2018-05-15 Dorit S. Hochbaum

CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by…

Machine Learning · Computer Science 2026-03-09 Taixi Chen , Yiu-ming Cheung , Yiqun Zhang

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing cluster evolutionary history and…

Data Structures and Algorithms · Computer Science 2021-01-14 MohammadTaghi Hajiaghayi , Marina Knittel

Significance-Based Categorical Data Clustering

Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the…

Machine Learning · Computer Science 2022-11-09 Lianyu Hu , Mudi Jiang , Yan Liu , Zengyou He

Information based clustering

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

Issues,Challenges and Tools of Clustering Algorithms

Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure.…

Information Retrieval · Computer Science 2011-10-13 Parul Agarwal , M. Afshar Alam , Ranjit Biswas

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Clustering is a fundamental analysis tool aiming at classifying data points into groups based on their similarity or distance. It has found successful applications in all natural and social sciences, including biology, physics, economics,…

Information Retrieval · Computer Science 2021-02-24 Wen-Bo Xie , Yan-Li Lee , Cong Wang , Duan-Bing Chen , Tao Zhou

Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized,…

Artificial Intelligence · Computer Science 2017-10-31 Isaac J. Sledge , Jose C. Principe

Distributed Lance-William Clustering Algorithm

One important tool is the optimal clustering of data into useful categories. Dividing similar objects into a smaller number of clusters is of importance in many applications. These include search engines, monitoring of academic performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-21 Gavriel Yarmish , Philip Listowsky , Simon Dexter

Cluster Explanation via Polyhedral Descriptions

Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is…

Machine Learning · Computer Science 2022-10-18 Connor Lawless , Oktay Gunluk

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…

Computational Geometry · Computer Science 2018-01-26 Luis-Evaristo Caraballo , José-Miguel Díaz-Báñez , Nadine Kroher

Uncovering Group Level Insights with Accordant Clustering

Clustering is a widely-used data mining tool, which aims to discover partitions of similar items in data. We introduce a new clustering paradigm, \emph{accordant clustering}, which enables the discovery of (predefined) group level insights.…

Machine Learning · Computer Science 2017-04-11 Amit Dhurandhar , Margareta Ackerman , Xiang Wang