Related papers: On a Near-Optimal \& Efficient Algorithm for the S…
Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of…
In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we…
In the pooled data problem the goal is to efficiently reconstruct a binary signal from additive measurements. Given a signal $\sigma \in \{ 0,1 \}^n$, we can query multiple entries at once and get the total number of non-zero entries in the…
We study active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most $\eta$ for a parameter $\eta…
Sparse coding algorithm is an learning algorithm mainly for unsupervised feature for finding succinct, a little above high - level Representation of inputs, and it has successfully given a way for Deep learning. Our objective is to use High…
For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and…
Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data…
Simply put, a sparse polynomial is one whose zero coefficients are not explicitly stored. Such objects are ubiquitous in exact computing, and so naturally we would like to have efficient algorithms to handle them. However, with this compact…
We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking…
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an…
Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point…
The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…
The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no…
In many high-dimensional problems, like sparse-PCA, planted clique, or clustering, the best known algorithms with polynomial time complexity fail to reach the statistical performance provably achievable by algorithms free of computational…
In many real-world machine learning applications, unlabeled data can be easily obtained, but it is very time-consuming and/or expensive to label them. So, it is desirable to be able to select the optimal samples to label, so that a good…
In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$…
One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic…
Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…
In the pooled data problem we are given $n$ agents with hidden state bits, either $0$ or $1$. The hidden states are unknown and can be seen as the underlying ground truth $\sigma$. To uncover that ground truth, we are given a querying…
Group testing is the process of pooling arbitrary subsets from a set of $n$ items so as to identify, with a minimal number of tests, a "small" subset of $d$ defective items. In "classical" non-adaptive group testing, it is known that when…