Related papers: Estimating a probability mass function with unknow…
Estimating a constrained relation is a fundamental problem in machine learning. Special cases are classification (the problem of estimating a map from a set of to-be-classified elements to a set of labels), clustering (the problem of…
We consider the problem of estimating the missing mass, partition function or evidence and its probability distribution in the case that for each sample point in the discrete sample space its (unnormalized) probability mass is revealed.…
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed,…
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the…
Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two…
We consider the problem of selecting confounders for adjustment from a potentially large set of covariates, when estimating a causal effect. Recently, the high-dimensional Propensity Score (hdPS) method was developed for this task; hdPS…
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably…
Parameter identification problems are formulated in a probabilistic language, where the randomness reflects the uncertainty about the knowledge of the true values. This setting allows conceptually easily to incorporate new information, e.g.…
An algorithm is proposed, analyzed, and tested for solving continuous nonlinear-equality-constrained optimization problems where the objective and constraint functions are defined by expectations or averages over large, finite numbers of…
The task of estimating a matrix given a sample of observed entries is known as the \emph{matrix completion problem}. Most works on matrix completion have focused on recovering an unknown real-valued low-rank matrix from a random sample of…
Several interesting generative learning algorithms involve a complex probability distribution over many random variables, involving intractable normalization constants or latent variable normalization. Some of them may even not have an…
Estimating the score, i.e., the gradient of log density function, from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models that involve flexible yet intractable…
An important challenge in statistical analysis concerns the control of the finite sample bias of estimators. For example, the maximum likelihood estimator has a bias that can result in a significant inferential loss. This problem is…
We propose a new estimator for the high-dimensional linear regression model with observation error in the design where the number of coefficients is potentially larger than the sample size. The main novelty of our procedure is that the…
Given samples from a distribution, how many new elements should we expect to find if we continue sampling this distribution? This is an important and actively studied problem, with many applications ranging from unseen species estimation to…
In this paper we consider the estimation of unknown parameters in Bayesian inverse problems. In most cases of practical interest, there are several barriers to performing such estimation, This includes a numerical approximation of a…
Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or…
Statistical inference on histograms and frequency counts plays a central role in categorical data analysis. Moving beyond classical methods that directly analyze labeled frequencies, we introduce a framework that models the multiset of…
Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and…
We introduce a new family of estimators for unnormalized statistical models. Our family of estimators is parameterized by two nonlinear functions and uses a single sample from an auxiliary distribution, generalizing Maximum Likelihood Monte…