Related papers: Same-Cluster Querying for Overlapping Clusters

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering with noisy queries (or a faulty oracle). Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an…

Machine Learning · Statistics 2017-06-26 Arya Mazumdar , Barna Saha

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full…

Data Structures and Algorithms · Computer Science 2019-08-15 Barna Saha , Sanjay Subramanian

Learning to Cluster via Same-Cluster Queries

We study the problem of learning to cluster data points using an oracle which can answer same-cluster queries. Different from previous approaches, we do not assume that the total number of clusters is known at the beginning and do not…

Machine Learning · Computer Science 2021-08-18 Yi Li , Yan Song , Qin Zhang

Query Complexity of Clustering with Side Information

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, "do two elements $u$ and $v$ belong to the same cluster?".…

Machine Learning · Statistics 2017-06-26 Arya Mazumdar , Barna Saha

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general…

Machine Learning · Computer Science 2026-03-23 Ryan DeWolfe , Paweł Prałat , François Théberge

Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with…

Machine Learning · Computer Science 2021-06-22 Pan Peng , Jiapeng Zhang

Hierarchical Clustering with Structural Constraints

Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by…

Data Structures and Algorithms · Computer Science 2018-07-17 Vaggos Chatziafratis , Rad Niazadeh , Moses Charikar

Relaxed Oracles for Semi-Supervised Clustering

Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of…

Machine Learning · Statistics 2017-11-21 Taewan Kim , Joydeep Ghosh

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

Clustering Via Crowdsourcing

In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations.…

Data Structures and Algorithms · Computer Science 2016-04-08 Arya Mazumdar , Barna Saha

A Rapid Review of Clustering Algorithms

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao

Overlapping Clustering Models, and One (class) SVM to Bind Them All

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a…

Machine Learning · Statistics 2018-11-06 Xueyu Mao , Purnamrita Sarkar , Deepayan Chakrabarti

Statistical analysis of a hierarchical clustering algorithm with outliers

It is well known that the classical single linkage algorithm usually fails to identify clusters in the presence of outliers. In this paper, we propose a new version of this algorithm, and we study its mathematical performances. In…

Statistics Theory · Mathematics 2022-03-21 Nicolas Klutchnikoff , Audrey Poterie , Laurent Rouviere

Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. A number of co-clustering techniques have been proposed including information-theoretic co-clustering and the…

Machine Learning · Computer Science 2020-04-27 Joyce Jiyoung Whang , Inderjit S. Dhillon

Correlation Clustering with Adaptive Similarity Queries

In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we…

Machine Learning · Computer Science 2020-01-15 Marco Bressan , Nicolò Cesa-Bianchi , Andrea Paudice , Fabio Vitale

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Truecluster matching

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which…

Artificial Intelligence · Computer Science 2007-05-31 Jens Oehlschlägel

Element-centric clustering comparison unifies overlaps and hierarchy

Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering…

Machine Learning · Statistics 2019-06-13 Alexander J. Gates , Ian B. Wood , William P. Hetrick , Yong-Yeol Ahn

Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations

Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping…

Machine Learning · Computer Science 2023-07-17 Thibault Marette , Pauli Miettinen , Stefan Neumann