Related papers: On a Near-Optimal \& Efficient Algorithm for the S…

Near optimal efficient decoding from pooled data

Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of…

Information Theory · Computer Science 2022-02-10 Max Hahn-Klimroth , Noela Müller

Phase Transitions in the Pooled Data Problem

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we…

Machine Learning · Statistics 2017-10-19 Jonathan Scarlett , Volkan Cevher

On the Parallel Reconstruction from Pooled Data

In the pooled data problem the goal is to efficiently reconstruct a binary signal from additive measurements. Given a signal $\sigma \in \{ 0,1 \}^n$, we can query multiple entries at once and get the total number of non-zero entries in the…

Discrete Mathematics · Computer Science 2022-04-14 Oliver Gebhard , Max Hahn-Klimroth , Dominik Kaaser , Philipp Loick

Efficient active learning of sparse halfspaces with arbitrary bounded noise

We study active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most $\eta$ for a parameter $\eta…

Machine Learning · Computer Science 2021-08-16 Chicheng Zhang , Jie Shen , Pranjal Awasthi

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation

Sparse coding algorithm is an learning algorithm mainly for unsupervised feature for finding succinct, a little above high - level Representation of inputs, and it has successfully given a way for Deep learning. Our objective is to use High…

Machine Learning · Computer Science 2014-04-08 R. Vidya , Dr. G. M. Nasira , R. P. Jaia Priyankka

Efficient Algorithms for Learning from Coarse Labels

For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and…

Machine Learning · Computer Science 2023-03-27 Dimitris Fotakis , Alkis Kalavasis , Vasilis Kontonis , Christos Tzamos

Should data ever be thrown away? Pooling interval-censored data sets with different precision

Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data…

Methodology · Statistics 2023-03-02 Krasymyr Tretiak , Scott Ferson

What Can (and Can't) we Do with Sparse Polynomials?

Simply put, a sparse polynomial is one whose zero coefficients are not explicitly stored. Such objects are ubiquitous in exact computing, and so naturally we would like to have efficient algorithms to handle them. However, with this compact…

Symbolic Computation · Computer Science 2018-07-24 Daniel S. Roche

Fast Randomized Semi-Supervised Clustering

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking…

Machine Learning · Computer Science 2018-06-28 Alaa Saade , Florent Krzakala , Marc Lelarge , Lenka Zdeborová

Crowd Labeling: a survey

Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an…

Artificial Intelligence · Computer Science 2014-09-04 Jafar Muhammadi , Hamid Reza Rabiee , Abbas Hosseini

Labelling as an unsupervised learning problem

Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point…

Machine Learning · Statistics 2018-05-31 Terry Lyons , Imanol Perez Arribas

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

Group Testing: An Information Theory Perspective

The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no…

Information Theory · Computer Science 2026-05-15 Matthew Aldridge , Oliver Johnson , Jonathan Scarlett

Computational lower bounds in latent models: clustering, sparse-clustering, biclustering

In many high-dimensional problems, like sparse-PCA, planted clique, or clustering, the best known algorithms with polynomial time complexity fail to reach the statistical performance provably achievable by algorithms free of computational…

Statistics Theory · Mathematics 2025-06-17 Bertrand Even , Christophe Giraud , Nicolas Verzelen

Unsupervised Pool-Based Active Learning for Linear Regression

In many real-world machine learning applications, unlabeled data can be easily obtained, but it is very time-consuming and/or expensive to label them. So, it is desirable to be able to select the optimal samples to label, so that a good…

Machine Learning · Computer Science 2020-01-16 Ziang Liu , Dongrui Wu

More Algorithms for Provable Dictionary Learning

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$…

Data Structures and Algorithms · Computer Science 2014-01-06 Sanjeev Arora , Aditya Bhaskara , Rong Ge , Tengyu Ma

Hard labels sampled from sparse targets mislead rotation invariant algorithms

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic…

Machine Learning · Statistics 2026-03-24 Avrajit Ghosh , Bin Yu , Manfred Warmuth , Peter Bartlett

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

Efficient Approximate Recovery from Pooled Data Using Doubly Regular Pooling Schemes

In the pooled data problem we are given $n$ agents with hidden state bits, either $0$ or $1$. The hidden states are unknown and can be seen as the underlying ground truth $\sigma$. To uncover that ground truth, we are given a querying…

Machine Learning · Computer Science 2023-03-02 Max Hahn-Klimroth , Dominik Kaaser , Malin Rau

Nearly Optimal Sparse Group Testing

Group testing is the process of pooling arbitrary subsets from a set of $n$ items so as to identify, with a minimal number of tests, a "small" subset of $d$ defective items. In "classical" non-adaptive group testing, it is known that when…

Information Theory · Computer Science 2018-09-21 Venkata Gandikota , Elena Grigorescu , Sidharth Jaggi , Samson Zhou