Related papers: Optimal Clustering from Noisy Binary Feedback

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering with noisy queries (or a faulty oracle). Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an…

Machine Learning · Statistics 2017-06-26 Arya Mazumdar , Barna Saha

Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups such that items…

Machine Learning · Statistics 2025-03-19 Maximilian Graf , Victor Thuot , Nicolas Verzelen

Almost Asymptotically Optimal Active Clustering Through Pairwise Observations

We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses. At each time step, an agent is allowed to query pairs of items and observe bandit…

Machine Learning · Computer Science 2026-02-06 Rachel S. Y. Teo , P. N. Karthik , Ramya Korlakai Vinayak , Vincent Y. F. Tan

Noisy Adaptive Group Testing via Noisy Binary Search

The group testing problem consists of determining a small set of defective items from a larger set of items based on a number of possibly-noisy tests, and has numerous practical applications. One of the defining features of group testing is…

Information Theory · Computer Science 2021-11-12 Bernard Teo , Jonathan Scarlett

Clustering Via Crowdsourcing

In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations.…

Data Structures and Algorithms · Computer Science 2016-04-08 Arya Mazumdar , Barna Saha

Universal Clustering via Crowdsourcing

Consider unsupervised clustering of objects drawn from a discrete set, through the use of human intelligence available in crowdsourcing platforms. This paper defines and studies the problem of universal clustering using responses of crowd…

Human-Computer Interaction · Computer Science 2016-10-11 Ravi Kiran Raman , Lav Varshney

Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

We introduce an unsupervised approach to efficiently discover the underlying features in a data set via crowdsourcing. Our queries ask crowd members to articulate a feature common to two out of three displayed examples. In addition we also…

Machine Learning · Statistics 2015-04-02 James Y. Zou , Kamalika Chaudhuri , Adam Tauman Kalai

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

Query Complexity of Clustering with Side Information

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, "do two elements $u$ and $v$ belong to the same cluster?".…

Machine Learning · Statistics 2017-06-26 Arya Mazumdar , Barna Saha

Learning with Clustering Structure

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…

Machine Learning · Computer Science 2016-09-20 Vincent Roulet , Fajwel Fogel , Alexandre d'Aspremont , Francis Bach

SACA: Selective Attention-Based Clustering Algorithm

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…

Machine Learning · Computer Science 2025-12-01 Meysam Shirdel Bilehsavar , Razieh Ghaedi , Samira Seyed Taheri , Xinqi Fan , Christian O'Reilly

Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

We consider a query-based data acquisition problem for binary classification of unknown labels, which has diverse applications in communications, crowdsourcing, recommender systems and active learning. To ensure reliable recovery of unknown…

Information Theory · Computer Science 2021-05-03 Daesung Kim , Hye Won Chung

Embracing Error to Enable Rapid Crowdsourcing

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of…

Human-Computer Interaction · Computer Science 2016-02-16 Ranjay Krishna , Kenji Hata , Stephanie Chen , Joshua Kravitz , David A. Shamma , Li Fei-Fei , Michael S. Bernstein

Collaborative Filtering Bandits

Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation…

Machine Learning · Computer Science 2016-06-01 Shuai Li , Alexandros Karatzoglou , Claudio Gentile

Hierarchical Clustering using Randomly Selected Similarities

The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining…

Machine Learning · Statistics 2012-07-20 Brian Eriksson

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment.…

Information Retrieval · Computer Science 2014-01-22 Sajib Dasgupta , Vincent Ng

Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered.…

Information Theory · Computer Science 2015-03-19 Brian Eriksson , Gautam Dasarathy , Aarti Singh , Robert Nowak

Active Learning of Custering with Side Information Using $\eps$-Smooth Relative Regret Approximations

Clustering is considered a non-supervised learning setting, in which the goal is to partition a collection of data points into disjoint clusters. Often a bound $k$ on the number of clusters is given or assumed by the practitioner. Many…

Machine Learning · Computer Science 2012-02-01 Nir Ailon , Ron Begleiter

A Streaming Algorithm for Crowdsourced Data Classification

We propose a streaming algorithm for the binary classification of data based on crowdsourcing. The algorithm learns the competence of each labeller by comparing her labels to those of other labellers on the same tasks and uses this…

Machine Learning · Statistics 2016-02-24 Thomas Bonald , Richard Combes

Unsupervised Crowdsourcing with Accuracy and Cost Guarantees

We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be…

Machine Learning · Computer Science 2022-07-06 Yashvardhan Didwania , Jayakrishnan Nair , N. Hemachandra