Related papers: Classification with High-Dimensional Sparse Sample…
We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability…
Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according…
We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster…
In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with…
We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm…
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a…
Let $(Y,X_1,...,X_m)$ be a random vector. It is desired to predict $Y$ based on $(X_1,...,X_m)$. Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the…
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…
Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…
We consider a learning problem of identifying a dictionary matrix D (M times N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where X is a sparse matrix (N times P dimension) in which the density of non-zero entries…
The small sample universal hypothesis testing problem is investigated in this paper, in which the number of samples $n$ is smaller than the number of possible outcomes $m$. The goal of this work is to find an appropriate criterion to…
The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for…
In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size…
In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on $n$ samples, whether a…
In this work we consider a problem of multi-label classification, where each instance is associated with some binary vector. Our focus is to find a classifier which minimizes false negative discoveries under constraints. Depending on the…
In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions,…
Let n denote the number of variables and m the number of equations in a sparse polynomial system over the binary field. We study the inconsistency probability of randomly generated sparse polynomial systems over the binary field, where each…
A Poisson Binomial distribution over $n$ variables is the distribution of the sum of $n$ independent Bernoullis. We provide a sample near-optimal algorithm for testing whether a distribution $P$ supported on $\{0,...,n\}$ to which we have…
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of…
Given a training sample of size $m$ from a $d$-dimensional population, we wish to allocate a new observation $Z\in \R^d$ to this population or to the noise. We suppose that the difference between the distribution of the population and that…