Related papers: Classification with High-Dimensional Sparse Sample…

High Dimensional Classification through $\ell_0$-Penalized Empirical Risk Minimization

We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability…

Methodology · Statistics 2018-11-26 Le-Yu Chen , Sokbae Lee

Second-Order Asymptotically Optimal Statistical Classification

Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according…

Information Theory · Computer Science 2018-12-07 Lin Zhou , Vincent Y. F. Tan , Mehul Motani

An Impossibility Result for High Dimensional Supervised Learning

We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster…

Machine Learning · Statistics 2016-11-17 Mohammad Hossein Rohban , Prakash Ishwar , Birant Orten , William C. Karl , Venkatesh Saligrama

Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with…

Machine Learning · Statistics 2024-10-30 Lina Zhu , Lin Zhou

Sparse Classification: a scalable discrete optimization perspective

We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm…

Optimization and Control · Mathematics 2025-01-08 Dimitris Bertsimas , Jean Pauphilet , Bart Van Parys

Neyman-Pearson classification, convexity and stochastic constraints

Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a…

Machine Learning · Statistics 2011-03-01 Philippe Rigollet , Xin Tong

Best subset selection, persistence in high-dimensional statistical learning and optimization under $l_1$ constraint

Let $(Y,X_1,...,X_m)$ be a random vector. It is desired to predict $Y$ based on $(X_1,...,X_m)$. Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the…

Statistics Theory · Mathematics 2007-06-13 Eitan Greenshtein

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…

Statistics Theory · Mathematics 2018-11-20 Felix Abramovich , Vadim Grinshtein

Instance-Based Classification through Hypothesis Testing

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…

Machine Learning · Computer Science 2019-04-23 Zengyou He , Chaohua Sheng , Yan Liu , Quan Zou

Sample Complexity of Bayesian Optimal Dictionary Learning

We consider a learning problem of identifying a dictionary matrix D (M times N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where X is a sparse matrix (N times P dimension) in which the density of non-zero entries…

Machine Learning · Computer Science 2014-02-06 Ayaka Sakata , Yoshiyuki Kabashima

Generalized Error Exponents For Small Sample Universal Hypothesis Testing

The small sample universal hypothesis testing problem is investigated in this paper, in which the number of samples $n$ is smaller than the number of possible outcomes $m$. The goal of this work is to find an appropriate criterion to…

Statistics Theory · Mathematics 2014-12-30 Dayu Huang , Sean Meyn

Non-splitting Neyman-Pearson Classifiers

The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for…

Methodology · Statistics 2022-06-07 Jingming Wang , Lucy Xia , Zhigang Bao , Xin Tong

Multiclass classification by sparse multinomial logistic regression

In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size…

Statistics Theory · Mathematics 2020-11-20 Felix Abramovich , Vadim Grinshtein , Tomer Levy

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on $n$ samples, whether a…

Statistics Theory · Mathematics 2022-02-17 Bhaswar B. Bhattacharya , Rajarshi Mukherjee

Classification of sparse binary vectors

In this work we consider a problem of multi-label classification, where each instance is associated with some binary vector. Our focus is to find a classifier which minimizes false negative discoveries under constraints. Depending on the…

Statistics Theory · Mathematics 2019-03-29 Evgenii Chzhen

A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions,…

Information Theory · Computer Science 2025-01-07 Ching-Fang Li , I-Hsiang Wang

Inconsistency Probability of Sparse Equations over F2

Let n denote the number of variables and m the number of equations in a sparse polynomial system over the binary field. We study the inconsistency probability of randomly generated sparse polynomial systems over the binary field, where each…

Probability · Mathematics 2026-03-27 P. Horak , I. Semaev

Testing Poisson Binomial Distributions

A Poisson Binomial distribution over $n$ variables is the distribution of the sum of $n$ independent Bernoullis. We provide a sample near-optimal algorithm for testing whether a distribution $P$ supported on $\{0,...,n\}$ to which we have…

Data Structures and Algorithms · Computer Science 2014-10-15 Jayadev Acharya , Constantinos Daskalakis

Kernel-Based Tests for Likelihood-Free Hypothesis Testing

Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of…

Machine Learning · Statistics 2023-11-27 Patrik Róbert Gerber , Tianze Jiang , Yury Polyanskiy , Rui Sun

Sparse classification boundaries

Given a training sample of size $m$ from a $d$-dimensional population, we wish to allocate a new observation $Z\in \R^d$ to this population or to the noise. We suppose that the difference between the distribution of the population and that…

Statistics Theory · Mathematics 2009-03-30 Yuri I. Ingster , Christophe Pouet , Alexandre B. Tsybakov