Related papers: Repeated Observations for Classification
In a large class of statistical inverse problems it is necessary to suppose that the transformation that is inverted is known. Although, in many applications, it is unrealistic to make this assumption, the problem is often insoluble without…
We consider deconvolution from repeated observations with unknown error distribution. So far, this model has mostly been studied under the additional assumption that the errors are symmetric. We construct an estimator for the non-symmetric…
Recent advances have demonstrated the possibility of solving the deconvolution problem without prior knowledge of the noise distribution. In this paper, we study the repeated measurements model, where information is derived from multiple…
In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements. First, we show the importance of this problem. Next, we propose a classifier and derive an…
We consider the demixing problem of two (or more) high-dimensional vectors from nonlinear observations when the number of such observations is far less than the ambient dimension of the underlying vectors. Specifically, we demonstrate an…
Researchers in the behavioral and social sciences use linear discriminant analysis (LDA) for predictions of group membership (classification) and for identifying the variables most relevant to group separation among a set of continuous…
Many machine learning problems, especially multi-modal learning problems, have two sets of distinct features (e.g., image and text features in news story classification, or neuroimaging data and neurocognitive data in cognitive science…
Consider the regression problem where the response $Y\in\mathbb{R}$ and the covariate $X\in\mathbb{R}^d$ for $d\geq 1$ are \textit{unmatched}. Under this scenario, we do not have access to pairs of observations from the distribution of $(X,…
The objective of the paper is to study accuracy of multi-class classification in high-dimensional setting, where the number of classes is also large ("large $L$, large $p$, small $n$" model). While this problem arises in many practical…
As technology advanced, collecting data via automatic collection devices become popular, thus we commonly face data sets with lengthy variables, especially when these data sets are collected without specific research goals beforehand. It…
We consider the classification problem of a high-dimensional mixture of two Gaussians with general covariance matrices. Using the replica method from statistical physics, we investigate the asymptotic behavior of a general class of…
We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the…
We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…
Efficient estimation under bias sampling, censoring or truncation is a difficult question which has been partially answered and the usual estimators are not always consistent. Several biased designs are considered for models with variables…
Image space feature detection is the act of selecting points or parts of an image that are easy to distinguish from the surrounding image region. By combining a repeatable point detection with a descriptor, parts of an image can be matched…
In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs $\{(x_i, y_i)\}_{1 \leq i \leq m}$. The observations $y_i \in \mathbb{R}$ are thereby supposed to follow a noisy single-index…
Let $(X,Y)$ be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label $Y\in \{1,2,...,L\}$ with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set…
Nonparametric density estimators are studied for $d$-dimensional, strongly spatial mixing data which is defined on a general $N$-dimensional lattice structure. We consider linear and nonlinear hard thresholded wavelet estimators which are…
We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the…
This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average…