Related papers: Truecluster matching

Truecluster: robust scalable clustering with model selection

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection,…

Artificial Intelligence · Computer Science 2007-06-13 Jens Oehlschlägel

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing cluster evolutionary history and…

Data Structures and Algorithms · Computer Science 2021-01-14 MohammadTaghi Hajiaghayi , Marina Knittel

Same-Cluster Querying for Overlapping Clusters

Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given $n$ elements to be clustered into $k$ possibly overlapping clusters, and an oracle that can interactively answer queries of the…

Machine Learning · Computer Science 2019-10-29 Wasim Huleihel , Arya Mazumdar , Muriel Médard , Soumyabrata Pal

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

This paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-29 Stuart Byma , Akash Dhasade , Adrian Altenhoff , Christophe Dessimoz , James R. Larus

Clustering with Label Consistency

Designing efficient, effective, and consistent metric clustering algorithms is a significant challenge attracting growing attention. Traditional approaches focus on the stability of cluster centers; unfortunately, this neglects the…

Data Structures and Algorithms · Computer Science 2025-12-23 Diptarka Chakraborty , Hendrik Fichtenberger , Bernhard Haeupler , Silvio Lattanzi , Ashkan Norouzi-Fard , Ola Svensson

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

A matching based clustering algorithm for categorical data

Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…

Machine Learning · Computer Science 2018-12-11 Ruben A. Gevorgyan , Yenok B. Hakobyan

On the Persistence of Clustering Solutions and True Number of Clusters in a Dataset

Typically clustering algorithms provide clustering solutions with prespecified number of clusters. The lack of a priori knowledge on the true number of underlying clusters in the dataset makes it important to have a metric to compare the…

Machine Learning · Computer Science 2018-11-20 Amber Srivastava , Mayank Baranwal , Srinivasa Salapaka

Fair Clustering with Clusterlets

Given their widespread usage in the real world, the fairness of clustering methods has become of major interest. Theoretical results on fair clustering show that fairness enjoys transitivity: given a set of small and fair clusters, a…

Machine Learning · Computer Science 2025-05-13 Mattia Setzu , Riccardo Guidotti

Normalised clustering accuracy: An asymmetric external cluster validity measure

There is no, nor will there ever be, single best clustering algorithm. Nevertheless, we would still like to be able to distinguish between methods that work well on certain task types and those that systematically underperform. Clustering…

Machine Learning · Computer Science 2025-10-16 Marek Gagolewski

Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets

Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few…

Data Structures and Algorithms · Computer Science 2019-02-18 Artem Lutov , Mourad Khayati , Philippe Cudré-Mauroux

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided…

Data Structures and Algorithms · Computer Science 2018-12-31 Maria-Florina Balcan , Colin White

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Clustering is a fundamental analysis tool aiming at classifying data points into groups based on their similarity or distance. It has found successful applications in all natural and social sciences, including biology, physics, economics,…

Information Retrieval · Computer Science 2021-02-24 Wen-Bo Xie , Yan-Li Lee , Cong Wang , Duan-Bing Chen , Tao Zhou

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…

Computational Geometry · Computer Science 2018-01-26 Luis-Evaristo Caraballo , José-Miguel Díaz-Báñez , Nadine Kroher

Clustering For Point Pattern Data

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited…

Machine Learning · Computer Science 2017-02-09 Quang N. Tran , Ba-Ngu Vo , Dinh Phung , Ba-Tuong Vo

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Zihan Wu , Zhaoke Huang , Hong Yan

Cross-Study Replicability in Cluster Analysis

In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management.…

Methodology · Statistics 2023-05-11 Lorenzo Masoero , Emma Thomas , Giovanni Parmigiani , Svitlana Tyekucheva , Lorenzo Trippa

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Network Clustering Approximation Algorithm Using One Pass Black Box Sampling

Finding a good clustering of vertices in a network, where vertices in the same cluster are more tightly connected than those in different clusters, is a useful, important, and well-studied task. Many clustering algorithms scale well,…

Social and Information Networks · Computer Science 2011-10-18 Thomas DuBois , Jennifer Golbeck , Aravind Srinivasan

Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the…

Machine Learning · Computer Science 2026-03-19 Jenni Lampainen , Kaisa Joki , Napsu Karmitsa , Marko M. Mäkelä